I have created a local repository (using the kopia docker image), which is about 1 TiB, and want to sync this offsite. I have tried both webdav and sftp remotes, and experience long waiting times “Looking for BLOBs to synchronize”.
I have about 40k blobs and it takes about 1.5 hours to read the blobs in the destination repository. Setting the flag --parallel=8 does not seem to make a difference. Interestingly, the log file only shows 31 minutes even though the timestamps are 85 minutes apart. Copying the missing ~180 blobs takes about 90 seconds:
2024-04-07T10:26:57.075704Z INFO kopia/cli Looking for BLOBs to synchronize...
2024-04-07T11:51:26.596629Z DEBUG kopia/repo [STORAGE] ListBlobs {"prefix":"","resultCount":44523,"error":null,"duration":"31m42.091803961s"}
2024-04-07T11:51:26.596738Z INFO kopia/cli Found 0 BLOBs to delete (0 B), 44344 in sync (1 TB)
...
2024-04-07T11:53:02.986635Z DEBUG kopia/repo [STORAGE] Close {"error":null,"duration":"1.147µs"}
My command line is: docker-compose exec kopia kopia repository sync-to from-config --file=/app/config/hetzner_sftp.config --parallel=8 --delete
Is there any way to speed up this initial scan of the destination repository? I have tried both webdav and sftp with similar results.
Well… I don’t know about Hetzner, but I have a Kopia repo of approx. 800GB/35k blobs on a Wasabi S3 bucket and the whole sync-to runs for approx. 3 mins. and the first phase of looking up the blobs in the target repo doesn’t last any longer than a couple of seconds…
Latencies will be king here… neither (S)FTP nor WebDAV are known for their low-latency behaviour… and the more files, you’ll have to deal with, the worse it gets. However, I will admit that 31 mins. for scanning the remote repo really look abnormal - even for one of those protocols.
Sorry to revive this, but did you find a solution @nakermann1973 ?
I’m in the same boat with a Hetzner storage box, having a local filesystem repository and trying to sync that to Hetzner via sftp. Comparing take a while for my ~2TB but for me the upload speeds are also slow. I get between 6-9 MB/sec, while I get 60+ MB/sec via rclone sync.
Another thing I don’t understand: I previously uploaded the whole repo via rclone sync, then tried kopia repository sync-to, which also sees the previously uploaded blobs but states, that they don’t match the local ones and it starts to upload all the blobs again.
I have not had any luck improving the initial scan speed - it still takes about 2 hours (for a ~1 TiB repo). The actual sync speed is OK at around 35MiB/s - a direct rclone sftp sync via sftp runs at around 60 MiB/s
I realise this is an old post, but thought a current update still makes sense.
I just started syncing to an offsite SFTPGo server. ~230,000 blobs, 5.5 TB, 1GBit/s down, 50 MBit/s up, on both ends the repository resides on an external spinning drive. Blob listing initially took up to 2 hours. For blob listing, --parallel is irrelevant (it affects file upload), changing --list-parallelism makes all the difference. With --list-parallelism 50, blob listing now takes 14 minutes. Above that, the connection is aborted in my case.