Repo migration>new snapshot>deletion of old snaps: doesn't free up space

I’ve unsuccessfully tried to migrate the repo of ~ 420GB mostly static source files to V2.
The total size of the repo came to over 600GB.

What I’ve done: Migrated, made a new snapshot, deleted all previous snapshots, initiated full maintenance which deleted just about 25GB. The second full maintenance performed back-to-back surprisingly freed another 11GB. The third one finished freeing 0. However, the total size of the repo is still over 570GB.

Totally new repo with the same settings (except 1M splitter instead of 4M) containing a single initial snapshot is 288GB.

New repo (DYNAMIC-1M-BUZHASH)

Count: 316606
Total Bytes: 359.1 GB
Total Packed: 308.8 GB (compression 14.0%)
By Method:
  (uncompressed)         count: 106239 size: 104.5 GB
  zstd-best-compression  count: 210367 size: 254.6 GB packed: 204.3 GB compression: 19.8%
Average: 1.1 MB
Histogram:

        0 between 0 B and 10 B (total 0 B)
      373 between 10 B and 100 B (total 24.5 KB)
    14978 between 100 B and 1 KB (total 6.1 MB)
     8865 between 1 KB and 10 KB (total 30.3 MB)
    13927 between 10 KB and 100 KB (total 642.9 MB)
   140216 between 100 KB and 1 MB (total 89.4 GB)
   138247 between 1 MB and 10 MB (total 218.7 GB)

Old repo (DYNAMIC-4M-BUZHASH):

Count: 213680
Total Bytes: 662.8 GB
Total Packed: 608.3 GB (compression 8.2%)
By Method:
  (uncompressed)         count: 150304 size: 412.7 GB
  zstd-best-compression  count: 63376 size: 250 GB packed: 195.5 GB compression: 21.8%
Average: 3.1 MB
Histogram:

        0 between 0 B and 10 B (total 0 B)
      435 between 10 B and 100 B (total 28.2 KB)
    27724 between 100 B and 1 KB (total 10.2 MB)
     6939 between 1 KB and 10 KB (total 22.5 MB)
    10155 between 10 KB and 100 KB (total 416.4 MB)
    22319 between 100 KB and 1 MB (total 10.6 GB)
   146108 between 1 MB and 10 MB (total 597.2 GB)
        0 between 10 MB and 100 MB (total 0 B)

(A high percentage of uncompressed files is expected due to extensive no-compress policy.)

Old repo had some snapshots made with different compression settings (pgzip-best), which, as I now understand, leads to no deduplication on V1. However, after removing all pre-migration snapshots and performing forced full maintenance I expect the size to be at least in the ballpark of a new repo.

Didn’t save the log, but at least 40% of the total repo size was “uploaded” to the old repo during snapshotting post-migration so even if the rest was totally uncompressed the results are still totally off: the repo size is double the size of the new one.

Here’s the snapshot list --all of the old repo:

Zoom@hostA:d:\dir1

Zoom@hostA:d:\dir2

Zoom@hostA:e:\dir2 # different mount point of an external drive

Zoom@hostB:h:\dir2
  2021-11-23 k22e78a355a6a2fa3183793ef00cc7f42 353.1 GB drwxrwxrwx files:21604 dirs:1392 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)

Zoom@hostB:h:\dir1
  2021-11-23  kc136bb3d1847faa65f3a1685367fb3d3 70.7 GB drwxrwxrwx files:2434 dirs:334 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)

Shouldn’t the user@host of the deleted snapshots disappear? Is it an indication of some orphaned blobs remaining?

Any advice on how to proceed? I’ve been sitting on an old repo in case you need some additional info to analyze or if there’s a way to properly migrate it to a new version, but I’d like to delete it if it’s not salvageable.

I’m assuming you upgraded existing repo v1->v2 and made some new snapshots whose paths were not in the v1?

The effect you’re seeing is a result of how differently compression is applied between v1 and v2 - in v1 it happens before hashing and v2 after hashing, so they will produce different contents internally. As long as you’re relying on caching, things should not cause a re-upload but in your case they did - likely because it was a brand new snapshot.

In other words when you use compression, there is generally no deduplication across old contents (v1 era) and new contents (v2 era). Non-compressed objects stay deduplicated across both. Repository started on v2 should be fine for both compressed and non-compressed contents.

To fix everything up use kopia snapshot migrate to a brand new repository which will properly deduplicate across old and new contents.

Not quite. I’ve updated a repo from v1 to v2, made one snapshot of the same directory (the only thing changed was the mounting point). The compression was different too. As I’ve stated above, I understand the v1 deduplication was performed post-compression, however, the main issue is that after removing all older snapshots and performing full maintenance the total size of the v2 repo is 35% larger than the size of a single two-dir snapshot it contains. Snapshot of two dirs listed above is 423GB while the repo is 570GB.

I already have the brand new repository of v2 and its size is 288GB. To repeat myself, its contents are totally identical to the old repo, just the splitting and compression algo are different.

There’s something clearly some issues with the state of the old repository.

PS: I may have muddied up the waters using the word “migration” for what was a version upgrade.

Ok, so all the “unaccounted” space was actually occupied by incomplete snapshot checkpoints. Shouldn’t they be cleaned up on full maintenance if all the completed snapshots for a source were deleted?

Also, some warning or a counter in the output of snapshot list --all would be nice so the users like me wouldn’t forget to check for the incomplete snapshots too.

I think that’s all understood - compression change totally accounts for that. You can go ahead and delete the old repo.