How to speed up repository sync-to

Hi,

I set up a new Home-Use-File-Server based on Unraid and use Kopia for backing up the system. This works perfectly fine and I love it! I greatly appreciate the work of the Kopia developer team!

I do nightly snapshots of some shares (Unraid Config, Home Folders, Photo collection) to a separate disk. Afterwards I replicate this repo to the Cloud (Onedrive) as an offsite backup. In general this works but it takes a rather long time in subsequent runs.

The repo size is about 190Gb and the “Looking for BLOBs to synchronize…” part takes nearly an hour to complete. Synchronizing the BLOBs itselt is fast afterwards (3-4 MB/s). My Internet bandwith should not be an issue (1000 Mbps down, 50 Mbps up).

I tried the “–parallel=32” option without any difference.

Here is an excerpt from my log:

2022-10-02T12:51:51.925069Z DEBUG tls generating new TLS certificate
2022-10-02T12:51:52.096791Z DEBUG tls adding alternative IP to certificate: 127.0.0.1
2022-10-02T12:51:52.100986Z DEBUG rclone starting /usr/bin/rclone
2022-10-02T12:51:53.276158Z DEBUG rclone detected webdav address: https://127.0.0.1:36051/
2022-10-02T12:51:53.277002Z DEBUG kopia/cli unable to check for updates: update check disabled
2022-10-02T12:51:53.530322Z DEBUG kopia/repo throttling limits from connection info	{"limits":{}}
2022-10-02T12:51:53.533745Z DEBUG kopia/repo [STORAGE] concurrency level reached	{"maxConcurrency":1}
2022-10-02T12:51:53.534186Z DEBUG kopia/repo [STORAGE] concurrency level reached	{"maxConcurrency":3}
2022-10-02T12:51:53.534183Z DEBUG kopia/repo [STORAGE] concurrency level reached	{"maxConcurrency":2}
2022-10-02T12:51:53.534194Z DEBUG kopia/repo [STORAGE] concurrency level reached	{"maxConcurrency":4}
2022-10-02T12:51:53.555476Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xw","resultCount":1,"error":null,"duration":"21.640785ms"}
2022-10-02T12:51:53.555527Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xe","resultCount":1,"error":null,"duration":"21.056984ms"}
2022-10-02T12:51:53.555612Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xs","resultCount":0,"error":null,"duration":"21.357861ms"}
2022-10-02T12:51:53.555547Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xr","resultCount":0,"error":null,"duration":"21.178477ms"}
2022-10-02T12:51:53.575905Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xn2_","resultCount":0,"error":null,"duration":"18.852514ms"}
2022-10-02T12:51:53.582630Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xn1_","resultCount":20,"error":null,"duration":"25.42777ms"}
2022-10-02T12:51:53.582881Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"xn0_","resultCount":21,"error":null,"duration":"25.700286ms"}
2022-10-02T12:51:53.622156Z INFO kopia/cli Synchronizing repositories:
2022-10-02T12:51:53.622217Z INFO kopia/cli   Source:      Filesystem: /app//data
2022-10-02T12:51:53.622268Z INFO kopia/cli   Destination: RClone Onedrive-Backup:kopia-backup-repo
2022-10-02T12:51:53.622647Z DEBUG kopia/repo [STORAGE] GetBlob	{"blobID":"kopia.repository","offset":0,"length":-1,"outputLength":1101,"error":null,"duration":"309.528µs"}
2022-10-02T12:51:57.278806Z INFO kopia/cli Looking for BLOBs to synchronize...

2022-10-02T12:52:53.533308Z DEBUG cache finished sweeping	{"cache":"index blob cache","duration":"1.348418ms","totalRetainedSize":0,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:52:53.549229Z DEBUG cache finished sweeping	{"cache":"contents","duration":"17.367466ms","totalRetainedSize":24531110,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:52:53.553014Z DEBUG cache finished sweeping	{"cache":"metadata","duration":"21.233234ms","totalRetainedSize":0,"tooRecentBytes":15199140,"tooRecentCount":40,"maxSizeBytes":5242880000,"inUsePercent":0}

2022-10-02T12:53:53.536445Z DEBUG cache finished sweeping	{"cache":"index blob cache","duration":"1.849309ms","totalRetainedSize":0,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:53:53.589036Z DEBUG cache finished sweeping	{"cache":"contents","duration":"39.179412ms","totalRetainedSize":24531110,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:53:53.602928Z DEBUG cache finished sweeping	{"cache":"metadata","duration":"48.826556ms","totalRetainedSize":0,"tooRecentBytes":15199140,"tooRecentCount":40,"maxSizeBytes":5242880000,"inUsePercent":0}

2022-10-02T12:54:53.538835Z DEBUG cache finished sweeping	{"cache":"index blob cache","duration":"1.865837ms","totalRetainedSize":0,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:54:53.602307Z DEBUG cache finished sweeping	{"cache":"contents","duration":"12.017856ms","totalRetainedSize":24531110,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T12:54:53.616264Z DEBUG cache finished sweeping	{"cache":"metadata","duration":"12.768067ms","totalRetainedSize":0,"tooRecentBytes":15199140,"tooRecentCount":40,"maxSizeBytes":5242880000,"inUsePercent":0}

...

2022-10-02T13:37:53.637608Z DEBUG cache finished sweeping	{"cache":"index blob cache","duration":"1.056113ms","totalRetainedSize":0,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T13:37:54.168069Z DEBUG cache finished sweeping	{"cache":"contents","duration":"12.067582ms","totalRetainedSize":24531110,"tooRecentBytes":0,"tooRecentCount":0,"maxSizeBytes":5242880000,"inUsePercent":0}
2022-10-02T13:37:54.236124Z DEBUG cache finished sweeping	{"cache":"metadata","duration":"18.820383ms","totalRetainedSize":0,"tooRecentBytes":15199140,"tooRecentCount":40,"maxSizeBytes":5242880000,"inUsePercent":0}

2022-10-02T13:38:20.058105Z DEBUG kopia/repo [STORAGE] ListBlobs	{"prefix":"","resultCount":8512,"error":null,"duration":"2.562320254s"}

2022-10-02T13:38:20.058215Z INFO kopia/cli   Found 0 BLOBs to delete (0 B), 8465 in sync (189.6 GB)
2022-10-02T13:38:20.058261Z INFO kopia/cli Copying...

...

What makes me wonder is:

  1. the “concurrency level reached” entries. No ideay what this means.
  2. the 3-line blocks “DEBUG cache finished sweeping” which start every minute. There are a lot more in the log

Any ideas on that?

Or are there any other option to speed up the sync?

Regards
Bernd

Just my 2c here, not an expert in this, but the issue might be with rclone and WebDav. I have used WebDav in the past, and I had lots of headaches with it around speed. Having said that, see Kopia sync-to rclone using webdav where I think the issue is very similar to yours. There is some bottlenecks around Kopia/rclone/WebDav, and the TLDR is that more work needs to be done. As a workaround, you could in theory do something like the poster in the other thread: which is have a timer/cron to run a shell script to rsync the local repo to remote and perhaps get a speed boost. The rsynced repo should behave identical to the original or the sync-to repo, so you shouldn’t get into any trouble.

Thanks for the info. I read the thread you referred to ans agree that this seems to be the same problem. Unfortunately there seems not to be any visible progress in development.

I made some further tests following the path to sync with rclone directly. Therefore I remotely copied the Repo on Onedrive which did not seem to work looking at the Onedrive UI. But the next day it was magically done. :slight_smile: So I rcloned (rsync does not work with Onedrive remote, does it?) my repo to this secondary repo and it took 21 minutes which is totally reasonable for the repo size.

So I am gonna stick with this method as long as there are no changes with Kopia.

Bernd

How can we see what kopia does to get rclone to serve via webdav? Apparently there are a couple of caching parameters one could potentially try but since I have no idea how to influence the parameters kopia passes to rclone this kinda s*cks.

I run a nightly kopia snapshot verify … --verify-files-percent=1… and it takes ages due to the same restraints:

DEBUG starting /usr/bin/rclone
DEBUG detected webdav address: https://127.0.0.1:46423/
DEBUG throttling limits from connection info    {"limits":{}}
DEBUG [STORAGE] concurrency level reached       {"maxConcurrency":1}
DEBUG [STORAGE] concurrency level reached       {"maxConcurrency":2}
DEBUG [STORAGE] concurrency level reached       {"maxConcurrency":3}
DEBUG [STORAGE] concurrency level reached       {"maxConcurrency":4}

Any pointers are very much appreciated.

1 Like

You can pass additional rclone parameters using the appropriate options during create or connect. As far as I can tell from the source code, it appears that all options are set to defaults that rclone chooses. If you know what options to pass rclone for the caching parameters, you can supply the appropriate CLI arguments and see if that helps.

Thanks, I’ll take a look. I assume the parameters I use when
kopia repository connect rclone are being saved in repository.config or how else can I figure out what parameters I initially used?

Yes, the parameters should be in your config. Have a look at ~/.config/kopia/repository.conf and you should see the parameters you used to connect to the repository.

I think we have to take seriously the warning from kopia docs:

“WARNING: Rclone support is experimental. In theory, all Rclone-supported storage providers should work with Kopia. However, in practice, only Dropbox, OneDrive, and Google Drive have been tested to work with Kopia through Rclone.”

It does work but it is rather proof of concept than real solution. I have played with it and there is one key problem for me - it is real slow.

Using rclone webdav was sort of low hanging fruit to make it dance but without more close integration it wont fly.

Other similar cloud backup programs use rclone as well and without advertising which one can be many times faster than kopia. I have spent hours on testing and conclusion is that kopia is great but do not use it with rclone backend.

1 Like

you can try:

kopia repository connect rclone --rclone-args=“–vfs-cache-mode=full” --remote-path=…

It makes massive difference for reads like restore so probably also for verify.

I have played a bit with rclone/kopia combo and I am 100% sure it is very inefficient the way how it is implemented at the moment. It works but especially for reading from repo is almost disastrous. Fair enough kopia author makes it clear that it is rather experiment.

1 Like