Status of running snapshot? Also, super slow uploads, how to debug?

sfatula · May 29, 2025, 7:22am

So, I have a repository server, and, a client who is running a snapshot create. Is there any command to show the status, percent done, etc. of the in process very long snapshot to see progression? What would be that command and on which machine would I run it? I can’t seem to find any command to do so.

cli

EDIT:

I can’t tell what Kopia is doing. top shows it uses mostly 0%. It’s a 32 processor system, so, I believe the default is to allocate 32 parallel tasks. Maybe that’s too much? System is 99% idle. Here’s the current load averages:

load average: 0.00, 0.03, 0.00

Here’s recent log entries from cli-logs:

2025-05-29T13:02:03.495605Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Prestolite Bowling Tourney,dur:36m14.894438283s,size:56666738,files:33,dirs:1,errors:0}
2025-05-29T13:15:35.140691Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Graciosa Island,dur:14m51.70888673s,size:45709510,files:12,dirs:1,errors:0}
2025-05-29T13:30:37.255758Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Robert and Sherry Timmons,dur:15m2.114594968s,size:36814352,files:10,dirs:1,errors:0}
2025-05-29T14:33:59.536981Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Hilton Head and Debbi,dur:1h18m24.395928102s,size:231599961,files:80,dirs:1,errors:0}
2025-05-29T15:04:13.721528Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Delaware,dur:30m14.184151747s,size:59576764,files:27,dirs:1,errors:0}
2025-05-29T19:50:34.446038Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Park City Dinosaur Valley,dur:5h16m34.908610432s,size:158328212,files:46,dirs:1,errors:0}
2025-05-29T21:04:29.792593Z DEBUG uploader snapshotted directory	{path:Digikam/ToBeDone/Morocco,dur:6h0m16.070725125s,size:127393174,files:21,dirs:1,errors:0}

So, it took 6 HOURS for that last directory which is an entire 122M? This is on a E5-2698v3.

I see basically 0 bytes going out over the network from the machine for the most part.

How can I troubleshoot this further?

budy · May 30, 2025, 5:51am

Can you layout ypir setup a bit more? Specs of the server and client hosts, network type and speed. The repo server actually doesn’t need that much performance as the heavy lifting is all performed on the client that runs the snapshot. It’s also on the client, where you should be able to watch the snapshot progress, while its running.

sfatula · May 30, 2025, 5:02pm

I could, but know that previously, I was backing up between the same 2 machines (the same directories) except instead of using Kopia repo server, was just using sftp. That was blazing fast, however, maintenance took forever especially when backups expired.

The client which you say matters most is a very fast E5-2698v3 with 32 threads, a IBM 9201 HBA, many enterprise drives such as Seagtae Exos X18 for the spinning rust and Samsung SSD for faster storage. Motherboard is a Supermicro X10SRA. RAM is 96GB. The machiner averages over 24 hours a load of under .5.

So, some differences between old Kopia and new besides using repo server. Kopia is using compression now which obviously is not a CPU speed issue. Can’t even detect Kopia running using top on either machine it’s so minimal. Very rarely see any network transfers whatever kopia is doing. Using v.20. Some repo settings:

Hash:                BLAKE3-256
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-4M-BUZHASH
Format version:      3
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Epoch Manager:       enabled
Current Epoch: 1

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs

Compression is zstd-better-compression.

LAN speed is 10G via optical cabling to a Fiberstore switch. WAN speed is 100G Upload which I can get to the repo server machine via iperf3. Latency averages 50ms.

Here’s another oddity, to me. It’s been 12 hours since I looked last. Repo server directory was at 114G 12 hours ago, it still is. Yet, some snapshot dirs have slowly finished, some somewhat large. Here’s the latest:

> |2025-05-30T07:21:09.941035Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Munsters,dur:2h43m54.010251674s,size:43613444,files:34,dirs:1,errors:0}|
> |---|---|
> |2025-05-30T08:06:59.429301Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Tanglewood - Lake Texoma,dur:3h29m43.49854691s,size:81578787,files:21,dirs:1,errors:0}|
> |2025-05-30T08:10:57.959228Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Aruba,dur:11h6m28.166281458s,size:248471041,files:61,dirs:1,errors:0}|
> |2025-05-30T10:25:35.805377Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Mount Rushmore,dur:5h48m19.874609064s,size:138730358,files:18,dirs:1,errors:0}|
> |2025-05-30T12:15:12.900718Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Costa Rica,dur:15h10m43.107708304s,size:625624890,files:178,dirs:1,errors:0}|
> |2025-05-30T12:33:45.798068Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/D-Day Reunion 2004,dur:7h56m29.8673584s,size:278556508,files:58,dirs:1,errors:0}|
> |2025-05-30T14:38:53.214209Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Las Vegas,dur:10h1m37.283446685s,size:553515622,files:174,dirs:1,errors:0}|
> |2025-05-30T15:35:27.216797Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Maui,dur:10h58m11.286104548s,size:526544792,files:206,dirs:1,errors:0}|
> |2025-05-30T15:36:17.296698Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone/Playa Andaluza,dur:10h59m1.366118026s,size:520058545,files:24,dirs:1,errors:0}|
> |2025-05-30T15:36:17.298570Z DEBUG uploader snapshotted directory|{path:Digikam/ToBeDone,dur:33h34m21.363835113s,size:26327904179,files:7114,dirs:109,errors:0}|
> |2025-05-30T15:36:17.376317Z DEBUG uploader snapshotted directory|{path:Digikam,dur:33h34m47.321073895s,size:41467160154,files:9732,dirs:173,errors:0}|

I am still worried that the default parallel of 0 which would mean 32 for this cpu is somehow involved in this slowness. Not an issue?

sfatula · May 31, 2025, 4:31am

It’s kicked into gear now that it’s past Digikam. So, my suspicion is maybe the compression algorithm was the slowdown, Digikam means camera photos/movies. That being said, I would expect the client machine to be quite busy with trying to compress thousands of such items. When in fact, it was essentially idle. Is perhaps compression single threaded, so the hangup might really be 1 cpu to try and compress thousands of items one at a time?

I have added stuff to the exclude from compression list for next time. I still don’t get it though. If I tired to compress all those items via simple proigram using the same algorithm, it wouldn’t take more than a day, or even remotely close on this machine.

budy · May 31, 2025, 8:48am

Well… a latency of 50ms is very high - I guess that the remote server is quite a bit away? Kopia deals with lots of fragments and high latencies tend to do the most harm to that. Regarding the effect of compression, while I 'd always go with lz4, since that seems to perform very well for compressable data with the lowest CPU impact, I’d agree that the CPU should be busy when having to run a lot of data through zstd-better, which will make the CPU work for marginally better gains.

I’d suggest that you run kopia benchmark compression --data-file=XXXXand use a sample from your Digikam folder to see how compression of those will affect the throughput. However, I stronly suspect the latency of 50ms for the lag. On my network at home the difference between 2ms (LAN) and 5ms (Wifi) already makes a big difference…

sfatula · May 31, 2025, 9:22am

50ms is not that high, sttp was at least 10 times faster than this repository server method is the point, same machines, same everything other than compression.

The machine is remote so yes, very far away. Not overseas which would be much higher.

Once digikam was done, it’s going pretty fast again, as it was before that set of files. I don’t think it’s latency for the reasons given. The progress is quite good again.

I took the largest file out of a folder of 61 files, it was by far the biggest. That folder took 11hours and only 2 files were “large”.

Here’s the results of the benchmark against the file:

     Compression                Compressed   Throughput   Allocs   Memory Usage
------------------------------------------------------------------------------------------------
  0. s2-default                 72 MB          2.9 GB/s     626      30.2 MB
  1. s2-parallel-8              72 MB          1.7 GB/s     575      22.8 MB
  2. s2-better                  72 MB          1.2 GB/s     720      71.4 MB
  3. s2-parallel-4              72 MB            1 GB/s     614      20.7 MB
  4. pgzip-best-speed           72 MB        711.6 MB/s     903      67.3 MB
  5. pgzip                      72 MB        517.5 MB/s     987      76.5 MB
  6. pgzip-best-compression     70.8 MB        435 MB/s     1414     151.5 MB
  7. zstd-fastest               71.8 MB      425.6 MB/s     3425     9.8 MB
  8. deflate-best-speed         72 MB          382 MB/s     17       818.6 KB
  9. deflate-default            72 MB        345.5 MB/s     17       1.1 MB
 10. gzip-best-speed            72 MB        239.9 MB/s     23       1.2 MB
 11. zstd                       70.8 MB      114.5 MB/s     1776     19.9 MB
 12. zstd-better-compression    70.4 MB       68.7 MB/s     1764     22.5 MB
 13. deflate-best-compression   70.8 MB       36.9 MB/s     18       1.1 MB
 14. gzip                       70.9 MB       30.3 MB/s     21       818.8 KB
 15. gzip-best-compression      70.9 MB       28.5 MB/s     26       823.8 KB
Decompressing input file "DSCF0001.AVI" (72 MB) using 16 compression methods.
     Compression                Compressed   Throughput   Allocs   Memory Usage
------------------------------------------------------------------------------------------------
  0. deflate-best-speed         72 MB          2.7 GB/s     27       38.7 KB
  1. deflate-default            72 MB          2.4 GB/s     17       39.4 KB
  2. s2-default                 72 MB          2.4 GB/s     8        2.1 MB
  3. pgzip                      72 MB          2.1 GB/s     105      4.2 MB
  4. s2-better                  72 MB          1.9 GB/s     8        2.1 MB
  5. gzip-best-speed            72 MB          1.8 GB/s     24       47 KB
  6. pgzip-best-speed           72 MB          1.7 GB/s     116      4.2 MB
  7. s2-parallel-4              72 MB          1.6 GB/s     8        2.1 MB
  8. s2-parallel-8              72 MB          1.5 GB/s     8        2.1 MB
  9. zstd-fastest               71.8 MB      764.9 MB/s     37       5.6 MB
 10. zstd-better-compression    70.4 MB      446.4 MB/s     31       10.5 MB
 11. zstd                       70.8 MB      423.5 MB/s     40       10.3 MB
 12. pgzip-best-compression     70.8 MB      146.9 MB/s     171      4.3 MB
 13. deflate-best-compression   70.8 MB      115.2 MB/s     19       39 KB
 14. gzip                       70.9 MB       97.1 MB/s     13128    1.8 MB
 15. gzip-best-compression      70.9 MB       78.2 MB/s     13127    1.8 MB

So, the file is only 72MB. At 68.7MB/s that’s an entire just over 1 second, so, not an issue. Since I have 32 cores, not worried about compression. So, I am completely confused as to why that folder would take over 11 hours, shouldn’t even take 11 minutes! 11 hours is insane. I could sftp them, I could rsync them, I could run virtually any backup software and it would take minutes of time for that folder.

I have no use for a local repository server as I already have onsite backups, and, I don’t have another machine for that. And I don’t want it on the same machine. Local backups don’t handle things like fire, theft, floods, etc which I actually came close to having when lightning struck my house. This is one of my off site backups. I don’t put all my eggs into one basket with backup locations AND software, I use several different types, you never know with bugs and all.

budy · May 31, 2025, 1:12pm

Well the E5-2698v3 has actually 16 cores, not 32 - and not all parts of the CPU are available to all threads for that matter. Then there is the issue that files will be split into chunks of up to 4G, depending on the splitter used. I’d try to run the snapshot with --parallel=15 to make sure that there’s enough CPU resources left for the system.

However, since you state that your CPU is largely underused, there seems to be another issue. Maybe have a look at log of the Kopia server, it may provide some insights.

sfatula · May 31, 2025, 6:17pm

Is 16 core 2 threads, so can load 32 processes, thus cpuinfo shows 32. Yes, I have utilization graphs so can easily look back in time and it’s idle. 99% idle over the backup period.

Thanks for trying to help. I’m going to size a max file size for compression and change to zstd. lz4 appears to be deprecated in kopia, watch for it! I might just send the initial backup all over again to see the impact. Or maybe just start with a problem directory.

sfatula · June 1, 2025, 6:35pm

Just to report back. I reduced parallel to 12, I changed to zstd-fastest as compression did close to nothing with my content anyway (but could in the future as my content changes), and I added the ignore of already compressed files. Result? DIgikam directory which took 33h34m took 4h50m. Which is reasonable given the size and my upload speed, it’s a first time backup anyway. I’ve averaged around 27 mbps which for me is decent speed. Incrementals should be fine given this.

Topic		Replies	Views
CLI equivalent for UI status General Topics	8	164	September 9, 2024
Progress during non-interactive snapshot creation General Topics	0	135	November 21, 2023
Snapshot task stuck General Topics	4	366	June 30, 2023
Questions: I started one snapshot but 7 are displayed Support	2	339	May 13, 2023
Stranges behaviors using kopia snapshot create Support	2	61	January 21, 2025

Status of running snapshot? Also, super slow uploads, how to debug?

Related topics