Question on incremental snapshot

pkumarn · July 19, 2021, 1:54pm

Hi,
Have been using kopis for a week for one of the projects, going through the docs it says all snapshots are incremental snapshot which means if we take backup of same path again, subsequent snapshots are only the diff from the previous snapshots.
Along these lines, when I delete parent snapshots (this was a full snapshot), I see that kopis can still restore from the incremental snapshot. How does kopia get data of the parent snapshot which is no more there? Some insights on how this is handled would be helpful.

-Prashanth

kees · July 19, 2021, 6:45pm

All snapshots are full backups, because kopia uses deduplication, it does not backup unchanged files.
So when restoring, you don’t have to restore the “full” backup and after that all incrementals.
That is a great feature in my opinion and saves backup storage space.

pkumarn · July 20, 2021, 3:39am

Thanks for the reply. Just curious below statement from kopia doc, what does “Incremental snapshot” mean here?

“All snapshots in Kopia are always incremental - they will only upload files that are not in the repository yet, which saves storage space and upload time. This even applies to files that were moved or renamed. In fact if two computers have exactly the same file, it will still be stored only once.”

If snapshots are full and it doesn’t backup unchanged files if I delete the very first snapshot would restore be still possible? The very first backup had all files backed up and the second one just took files that changed as I understand there should be some reference to the old snapshot.

budy · July 20, 2021, 7:39am

“incremental” means, that Kopia will only consider changed files for the next snapshot run. Since all the unchanged files are already in the repository, there’s no need to re-scan those, if they haven’t changed.

Nonetheless, all snapshots are always complete, so you can restore your source always in one pass. If you delete the first snapshot, you will loose those files, which have been present then, but have been deleted later on. Kopia’s pruning process takes care of that. See, a snapshot is more of a representation of the state of the source at that specific time. Blobs only get deleted when they are not anylonger referenced by any snapshot. So you won’t notice a huge drop of your repo size, when you delete the first “full” snapshot. It was only “full” in regards, that that time every file had to be scanned. So rather call it “full-work”…

jkowalski · July 20, 2021, 1:33pm

That’s exactly right.

Let me provide more details here, because what makes this particularly interesting is packing - to keep repository structure manageable and not have millions of files in the repository, kopia will concatenate chunks of original files (“contents”) into larger files stored in repository (“blobs”). Only fully unreferenced blobs can be deleted, obviously.

As part of maintenance kopia will detect blobs that are partially full (they contain a mix live and non-live contents) and will periodically rewrite live contents to separate blobs making the partial blobs subject to garbage collection.

For example,

Imagine first snapshot had files/contents 1,2,3,4,5,6,7,8,9,10,11 and second snapshot had 1,3,5,7,9,… They would typically be packaged into pack blobs like so:

blob1: [1,2,3,4]
blob2: [5,6,7,8]
blob3: [9,10,11]

(note: second snapshot did not write any blobs due to deduplication)

When first snapshot gets deleted, contents 2,4,6,8 & 10 would become unreferenced but none of blob1,blob2,blob3 would be able to deleted.

As part of next maintenance Kopia will perform compaction of alive contents into new blobs and write:

blob4: [1,3,5,7]
blob5: [9,11]

Now some contents have 2 copies in multiple blobs and blob1, blob2, blob3 can now be deleted.

(Note: this compaction only happens when blobs have enough “holes” in them so deleting one or two contents may not always rewrite the remaining ones)

This deletion of unreferenced blobs does not happen immediately, because Kopia also supports concurrent operation, where another kopia instance can be running at the other side of the world at exactly the same time and have they may need blob1,blob2,blob3 and hold on to them for a while due to caching.

Each client will invalidate their cache after 15 minutes or so, thus deletion of blob1,2,3 will wait around 1h to be safe and ensure all other clients will switch to blob4, blob5 as the authoritative source.

pkumarn · July 20, 2021, 2:26pm

These details are quite good. One question so when we even do --delete option, are we saying actual blobs are not deleted? Does this mean if my repository is s3, those files are still there (still consuming space), and only during maintenance, these are deleted. Am I wrong here?

jkowalski · July 20, 2021, 3:08pm

Thats correct. It’s recommended to run maintenance frequently and it will keep repository nice and compact as it changes.

pkumarn · July 20, 2021, 3:12pm

Another question, I assume that when maintenance is done, kopia has the intelligence to go and delete blobs from s3?

jkowalski · July 20, 2021, 3:19pm

That’s correct. It will do it automatically as part of each maintenance.

pkumarn · July 21, 2021, 6:54am

Hi, Was thinking through this with my use case in mind where I am writing a wrapper over CLI for taking a snapshot, restore and delete and have few questions.

When I take a backup of a multiple repos to s3 (each having their own credentials), how does kopia maintenance know the required creds for each repo?
[As i know when we create a repo it creates config files under /root/.config/kopia but what i see is this config files are overwritten every time a new repo is created.]
I take a backup of a repo, after sometime let’s says maintenance is triggered and at the same time user triggers another backup for the repo. So in this scenario, i assume kopia will skip deleting those blobs from the repo right as another backup is in progress?

pkumarn · July 22, 2021, 4:09am

Hi,
Can anyone throw some light on this?

jkowalski · July 23, 2021, 12:41am

There’s currently no way to get dynamic credentials to s3 provider or any other provider for that matter. You can probably extend s3 provider for your use case but it i think you’re dangerously close to a solution that will be diverging from where Kopia is heading and thus difficult to maintain.

jkowalski · July 23, 2021, 12:41am

Regarding your other question - yes it is safe to do snapshots while maintenance is running.

budy · July 23, 2021, 10:16am

When I am running such things, I always create several Kopia server instances, which are running simultaneously, such that there is one Kopia process per S3 bucket - each run either from a script which either contains the credentials or a config which points to a specific password file for the resp. S3 bucket.

Topic		Replies	Views
Is it possible to take incremental snapshots without removing old data from previous snaps? Support	1	325	December 26, 2022
Can kopia be used to create long term backup of files that may get deleted? General Topics	4	155	December 24, 2024
Kopia backup strategy and file corruption General Topics	4	244	October 16, 2024
Kopia restore queries General Topics	0	235	March 6, 2023
Miscellaneous questions General Topics	2	101	April 2, 2025

Question on incremental snapshot

Related topics