Can Kopia maintenance interfere with kopia sync-to?

I have been synchronizing one of m local server repos to S3 for a couple of weeks now. I have set up a schedule, where each night at 03.00 the local repo will get synched to S3 using kopia repo sync-to S3 at a 1 Mbit/s bandwidth. Depending on the changes, this runs from 1 to multiple hours.

Recently, I noticed that KopiaUI was not able to list the snapshots or any other fragment when choosing thesynchronized repo. When connecting with kopia cli, there’s always the issue of missing BLOBs, although the sync itself didn’t state any issues.

Could it be that the periodic maintenance interferes with the sync, since I can resolve this issue, by running a sync-to operation again, which somtimes uploads as little as a couple of BLOBs of a total of 16 MB (as of today) and that makes the synchronized repo working again.

Maybe having the options

–pause-quick
–pause-full

in the sync-to command might be a good idea. I will try now to set those before issuing the sync-to command.

Yeah, it most likely does interfere, what could be happening is maintenance is deleting some blobs that are being synced. The safety buffers for maintenance are around 1-2 hours, Next full sync should fix the issue.

How are you running maintenance in this case? Manually using cron or using server?

I am running kopia repo sync from a cronjob at night at 3pm. I have now modified the script that it first connects to the server repo locally, which is necessary anyway to be able to perform the sync-to operation and then issue a

kopia maintenance set --pause-quick=3h

which is usually the longest time it will run, at this 1 Mbps. Afterwards, I will run the sync-to and finally re-enable the quick maintenance by issuing a

kopia maintenance set --pause-quick=1h

before I disconnect from the repo again.

Should this be considered a best practice, to pause maintenance when running a sync? If maintenance is run from the same machine as sync-to could the pausing of maintenance become automatic or some sort of lock be placed on the blobs gathered during sync prep?

Also, I’m currently running a setup where both maintenance and sync are automated and I have to assume there are many moments of overlap where maintenance kicks off during a sync. If I’m not receiving any errors, should I assume all is well, or am I risking creating a bad backup at the destination repo?

For me it still is. Neither maintenance, nor sync-to will likely report any errors. At least they didn’t when I had this issue. I am still running my sync as outlined above and never had a single issue since.

Since you will need to script your sync-to anyway, it isn’t a big deal putting maintenance on hold, for the time the sync runs, is it? I admit, it would be nice to have a switch for the sync command, but I wouldn’t hold my breath for it to arrive.

In my current setup, the maintenance owner is different from the machine doing the sync-to, but this is encouraging me to change that. I have had reliability issues regarding maintenance running automatically on the schedule I had set, so I’m afraid to touch my setup which is finally working. I’m sure I won’t be able to resist though and will report back if anything comes up there.

Regarding silent errors…

  1. If there is an error, why would it not be reported? In the example scenario of sync-to delivering a set of blobs, but some of them are deleted via maintenance before sync-to can transfer them, why would that not result in a failed sync of the repository?
  2. How can I verify that my offsite destination repository is healthy? Will running kopia snapshot verify report an issue? What if only one blob / piece of data is missing? Can I still mount the destination repo in read-only and recover the data unaffected by the missing blob(s)? Will my destination repo work fine, but simply be missing the files associated with the missing blobs or are there larger implications?

I thought I understood some of these deeper inner-workings, but as I start to dissect the possible ways the various processes in my kopia backup chain could fail, I worry about the resiliency of this setup. Lots of things could get in the way of a process completing (power loss, network loss, drive failure, etc.) so I’m curious about the implications on these situations on the health of the destination repo.

What do you mean by that? I am running a Kopia Server and it’s doing the maintenance. The sync-to is also running on that server and all I am doing is to tell the repo to pause the maintenenace antil after the sync is done. Its all very basic and easy to do.

I have 4 client computers. They all run Kopia UI and backup to a single repository located on a Windows file server. The server itself is the one running the sync-to command.

However, when setting this all up, I couldn’t get Kopia to successfully run maintenance from the server (worked through troubleshooting that here, on slack, and GH). I finally got scheduled maintenance working on my main production machine (one of the 4 client computers) so that is the machine that is the current “maintenance owner.”

There have since been updates addressing maintenance so I will likely try again to get Kopia to run automated maintenance from the server, or just give up on letting Kopia manage that and automate it myself via scripts (I rather Kopia’s native maintenance automation manage that though). If I do, then as you point out, it will be simple to just add a line to pause maintenance before runing the sync-to.

And just to be clear, I am not utilizing the Repository Server feature. It’s just a normal repository, located on a file server (because I couldn’t get that to work properly either).

If you’re already doing sync-to, it might make sense to disable maintenance altogether and run it manually just before sync-to:

$ kopia maintenance set --enable-full false

and:

$ kopia maintenance run --full
1 Like

OK, thanks. That seems to make the most sense if I can’t get get the native automated maintenance working. I’m still confused about the usefulness of the “quick” vs “full” maintenance with the newest repository format. Are there any downsides to just running a full maintenance once a day (no quick at all, ever) vs the standard quick=1h, full=24h?

Also, from earlier in this thread, I’m still confused about this:

Thanks!