How to immediately gc unused blobs?

Another weird behavior that I only experience with Kopia. For instance, if I create a local filesystem repo, add a huge directory (~20G) to it to create a snapshot. Next day I remove that directory from the snapshot list, which is now empty (retention policy stores only the latest). I would like to immediately reclaim the space. However, I don’t find any user-face command that can achieve this. Maintenance with --full doesn’t work: if I run it too soon, it will tell me blob gc needs time to rerun; if I wait an hour and run it, it probably reclaim 20M and done. I’ll have to wait for a full 24 hours then suddenly all space are back.

What’s the point of that delay? Is it there for “regret” treatment? If so, I don’t see any command to “undelete” any snapshot ID. Is there non-user-facing command that can achieve (blob gc?)

Kopia has been designed to support concurrency - backing up several computers in parallel to the same repository.

That means we have protocols that rely on passage of time to give other computers time to finish or checkpoint whatever they may be doing for safety reasons.

Imagine computer A is in the middle of a backup that will take days. Computer B wants to do gc --full. Without such protections, parts of snapshot created by A may be deleted because B thinks they are not in use anymore, because the snapshot manifest has not been written yet.

This is greatly simplifying of course, the reality is more nuanced, we need to deal with eventually-consistent storage where you sometimes don’t see a blob that was just written, and so on. Waiting enough time is also a way to ensure (or at least dramatically increase probability) of consistency.

Having said that, we may be able to do something about those delays and introduce a flag that says:

--yes-i-really-know-what-i-m-doing-and-i-promise-my-storage-is-really-consistent-and-nobody-else-is-making-snapshots-right-now-so-please-let-me-do-full-gc-immediately

Joking aside, there are parameters that can be tweaked in the future - right now we’re erring on the side of safety and sacrificing (a bit) compactness of the repository. For large repos it does not really matter that much, since great majority of blobs are compacted, but for small single-person repositories it may make a difference.

I’d like to invite folks to help us by contributing code/tests to improve the experience here.

I’m completely fine with the safety measurement for some use cases. I just hope Kopia would provide options to support other use cases, such as single computer backup/restore. For example, support a policy that controls how long the reclamation delay is (default to 24h).

There’s already an issue filed for this - feature: add support for single-user repository · Issue #800 · kopia/kopia · GitHub so stay tuned.

BTW contributions (whether code, documentation, technical insight or any other kind) for this and other issues are welcome and appreciated