Making maintenance extra safe wrt clients suspended while creating a snapshot

I have a shared repository where one of the clients is a laptop that sometimes stays suspended for a few days. This client (“A”) runs periodic (daily) snapshots. A different machine (“B”, which is regularly on-line) is in charge of repository maintenance.

According to my undestanding of maintenance (based on the docs, this forum, and reading the code briefly), the following scenario could happen:

If client A starts creating a snapshot with a lot of new data, suspends (some time after the latest checkpoint) and stays offline for a few days, B will run snapshot-gc and mark the last contents written by A as unused. If A doesn’t wake up and also write the next checkpoint before B runs full maintenance again (24 hours by default), the next snapshot-gc will delete these contents. This will render A’s snapshot incomplete and unusable.

  1. Can this really happen, or is there something in the logic that prevents it?
  2. Will A detect the issue while creating the affected snapshot and report an error, or will the snapshot finish as if nothing has happened? (I think B will eventually detect the missing contents during maintenance.)
  3. Will the next snapshot created by A re-write the missing contents? Will this at least make the next snapshot complete, or will it even “heal” the originally broken snapshot?

How can one ask Kopia for maintenance --safety=extra to prevent this issue and/or mitigate its impact?

  1. I guess making A the maintenance owner would help, but that’s suboptimal for many reasons.
  2. Is maintenance set --full-interval=240h enough to prevent this unless A stays offline for 10 days?
  3. Anything else that does not require a really long maintenance interval?

As explained in this FAQ, Kopia creates an incomplete snapshot that is not garbage-collected. When waking up client A should (mostly) pick up were it left and continue until the snapshot can be completed.

I know about these incomplete snapshots (that’s the “latest checkpoint” I mentioned in my question), but my question is what happens to content objects written after the most recent incomplete snapshot (checkpoint). If I understand it correctly, those aren’t referenced by anything until the next incomplete snapshot is made (which might be days later) and can be garbage collected in the meantime.
Consider this sequence of events:

  1. Content objects X, Y are written by machine A
  2. A temporary (checkpoint/incomplete) snapshot is made, referencing X and Y
  3. Content object Z is written by machine A
  4. Writing machine (A) is suspended for a few days
  5. Another machine B (maintenance owner) runs full maintenance, sees Z as unreferenced and marks it as a candidate for deletion
  6. A day later, B runs full maintenance again, sees Z still unreferenced and deletes it
  7. Machine A eventually wakes up from sleep and continues the snapshot process, not knowing Z got deleted in the meantime, ultimately resulting in a broken snapshot

I don’t know the internals well enough to give you a definitive answer but I’m pretty sure the developers thought about this scenario and made sure this doesn’t cause issues.

You should ask this question in Kopia’s Slack channel, as the developers are way more active over there.