That’s exactly right.
Let me provide more details here, because what makes this particularly interesting is packing - to keep repository structure manageable and not have millions of files in the repository, kopia will concatenate chunks of original files (“contents”) into larger files stored in repository (“blobs”). Only fully unreferenced blobs can be deleted, obviously.
As part of maintenance kopia will detect blobs that are partially full (they contain a mix live and non-live contents) and will periodically rewrite live contents to separate blobs making the partial blobs subject to garbage collection.
Imagine first snapshot had files/contents 1,2,3,4,5,6,7,8,9,10,11 and second snapshot had 1,3,5,7,9,… They would typically be packaged into pack blobs like so:
(note: second snapshot did not write any blobs due to deduplication)
When first snapshot gets deleted, contents 2,4,6,8 & 10 would become unreferenced but none of blob1,blob2,blob3 would be able to deleted.
As part of next maintenance Kopia will perform compaction of alive contents into new blobs and write:
Now some contents have 2 copies in multiple blobs and blob1, blob2, blob3 can now be deleted.
(Note: this compaction only happens when blobs have enough “holes” in them so deleting one or two contents may not always rewrite the remaining ones)
This deletion of unreferenced blobs does not happen immediately, because Kopia also supports concurrent operation, where another kopia instance can be running at the other side of the world at exactly the same time and have they may need blob1,blob2,blob3 and hold on to them for a while due to caching.
Each client will invalidate their cache after 15 minutes or so, thus deletion of blob1,2,3 will wait around 1h to be safe and ensure all other clients will switch to blob4, blob5 as the authoritative source.