Help understanding retention behaviour of PR 'Ignore Identical Snapshots' #2049

Hello !

I am trying to figure out how policy option --ignore-identical-snapshot behaves .

I read the merged code to no avail for I lack data structure and architecture context.

Say, there are many identical snapshots, which is the unique representative of them all ? Does it get updated with the last identical snap ? I am asking because this implies different behavior for retention policy.

Consider we set 100 days policy retention and have 100 identical snapshots from daily snaps. Do I get a single 100-day label or a 1-day retention label from today ?

+1 day later increments the formal number to 101 identical snapshots (daily). Following previous cases , and remembering nothing happens at the same time :
Does the single snapshot get outdated (deleted) or does the 1-day label refresh (reassign) to it ? In the former case, before or after (depending on execution order, be it +1 more day later), we would get a state with a newly snap newly made from scratch and yet identical to the one deleted.

Ignore Identical Snapshots is UI only functionality. It makes viewing relevant snapshots easier by squashing identical ones.

It does not affect any retention behaviour.

I think what you are referring to is the option to show/hide identical snapshots (kopia snap list --[no-]show-identical). But as far as I can tell @milvi is referring to the policy setting (kopia policy set --ignore-identical-snapshots=true).

From what I can tell this option does the following: Before saving a snapshot it is compared to the previous snapshot. If there are no changes the snapshot is discarded and no data is saved. This option does not have any effect on previously made snapshots nor retention policies.

If you enable this option on a repository/policy which already has identical snapshots saved you will have to delete the identical snapshots manually. @budy wrote a script for that but I’m currently unable to find it.

I still have the Perl skript, which I used to remove all the “exccess” snapshots. If anyone is still interested, I can share it. At the time, I wrote that, we faced an issue, with backing up a client to a remote server, where the client’s repo had accumulated approx. 196k snaps, which did bug out KopiaServer.

At that time, I set the policy to ignore identical snapshots and removed all the identical ones, which brought he number of snaps down to 20k or so. Unfortuanetly, the underlying issue seems to have never been fixed.