Details of maintenance command

This is still intentionally vague, because details are subject to change, but let me provide a bit more information here (accurate as of v0.8.2 release).

I don’t want to go into too many details to avoid the information becoming stale quickly and to prevent folks overly-optimizing their backup configurations and cargo-culting those. The intention of Kopia and recommendation for most users is to not worry about maintenance at all as it should be automatic and unobtrusive - if that’s not the case, please file bugs.

(There’s always source code if somebody wants to go deeper)

There are two types of maintenance:

  • quick maintenance manages and optimizes indexes and q blobs that store metadata (directory listings, manifests such as snapshots, policies, acls, etc.).
  • full maintenance manages both q and p data blobs (which store contents of all files).

Maintenance is composed of individual tasks grouped into two sets:

Quick Maintenance

This runs frequently (hourly) with the goal of of keeping the number of index blobs (n) small, as high number of indexes negatively affects the performance of all kopia operations. This is because every write session (snapshot command, any policy manipulation, etc.) adds at least one n blob and usually one q blob so it’s very important to aggressively compact them:

  • quick-rewrite-contents - looks for contents in short q packs that utilize less than 80% of the target pack size (currently around 20MB) and rewrites them to a new, larger q pack, effectively orphaning the original packs and making them eligible for deletion after some time.
  • quick-delete-blobs - looks for orphaned q packs (that are not referenced by any index) and deletes them after enough time has passed for those contents to be no longer referenced by any cache.
  • index-compaction - merges multiple smaller index blobs (n) into larger ones

Full maintenance

The main purpose of full maintenance is to perform garbage collection of contents that are no longer needed after snapshots they belong to get deleted or age out of the system.

  • snapshot-gc - finds all contents (files and directory listings) that are no longer reachable from snapshot manifests and marks them as deleted. It also undeletes contents that are in use and have been marked as deleted before (due to unavoidable race between snapshot gc and snapshot create possible when multiple machines are involved).

NOTE: This is the most costly operation as it requires scanning all directories in all snapshots that are active in the system. The good news is that all this data is in q blobs and thanks to the quick maintenance it was kept nice and compact and quick to access, so this phase does not usually take that long (e.g. currently ~25 seconds on my 720 GB repository with >1.5M contents).

  • full-drop-deleted-content - removes contents that have been marked for deletion long enough from the index. This creates “holes” in pack blobs and/or makes blobs completely unused and subject to deletion.

  • full-rewrite-contents - same as quick-rewrite-contents but acts on all blobs (p and q)

  • full-delete-blobs - same as quick-delete-blobs but acts on all blobs (p and q)

There are additional safety measures built into the maintenance routine to make it safe to run even when other kopia clients on other machines are executing snapshots concurrently. For example {quick}-delete-blobs will not run if less than X amount of time has passed since last content rewrite and full-drop-deleted-content will only drop contents if enough time has passed between full maintenance cycles.

The recommendation is to run quick maintenance as frequently as it makes sense for your repository (hourly is typically fine). The entire quick cycle should take <10 seconds, even for big repositories.

Full maintenance cycle runs every 24h and can be spread apart further (weekly or even monthly is probably fine) or stopped completely if somebody does not want or care to reclaim unused space.