Kopia re-uploads data

Kopia seems to be reuploading data…

I have restored data on hostB (107.4Gb) from a repository that was backed up from hostA
I ran the backup on hostB the same 107.4Gb to create a baseline snapshot.
Kopia took time like normal but uploaded 572.4Mb…

There was no modification done. Is this normal…

Can you do kopia snapshot list and then kopia diff k<old-snapshot> k<new-snapshot> ?

this output is of another snapshot with the same case of partial reupload…
I am on a Windows platform.
The one above was restored using the show lzzzzzz trick with worked…

saviodsouza@savio:D:\Personal\vmware\GNS3 VM
  2020-09-21 19:00:43 IST k617b8c069d13de55dfe9fa15d4b12b53 205.2 GB drwxrwxrwx files:10 dirs:1 (latest-2)
  2020-09-22 08:52:32 IST k617b8c069d13de55dfe9fa15d4b12b53 205.2 GB drwxrwxrwx files:10 dirs:1 (latest-1,annual-1,monthly-1,weekly-1)
  
saviodsouza@savio-pc:C:\userdata\vmware
  2020-09-24 17:09:37 IST k177c8e93d6cad794da25be808aa2f513 205.2 GB dr-xr-xr-x files:10 dirs:1 (latest-3,weekly-2,daily-2)
  2020-09-28 12:23:48 IST kfcd48f1b17febcbe577e29daacc1beec 205.2 GB dr-xr-xr-x files:10 dirs:2 (latest-2,hourly-2)
  2020-09-28 14:50:45 IST kdcfc3b3e40b8e580afde0ecb2f5aec49 312.7 GB dr-xr-xr-x files:11 dirs:2 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)
  
kopia.exe diff k617b8c069d13de55dfe9fa15d4b12b53 k177c8e93d6cad794da25be808aa2f513
changed ./GNS3 VM-disk1.vmdk at 2020-09-21 14:29:51.0350172 +0530 IST (size 4224385024 -> 4224385024)
changed ./GNS3 VM-disk2.vmdk at 2020-09-21 14:29:50.3283905 +0530 IST (size 200976891904 -> 200976891904)

Here is another fresh case i just tested on the same repository…
I have enabled compression…

604.2MB uploadeded in this case

saviodsouza@savio:D:\Personal\VMs\Linux\CentOS 8
  2020-09-12 20:24:49 IST k339a57452d72695f454b6e6036fc0bf3 18.8 GB drwxrwxrwx files:12 dirs:3 (latest-2,weekly-2)
  2020-09-17 14:09:01 IST k339a57452d72695f454b6e6036fc0bf3 18.8 GB drwxrwxrwx files:12 dirs:3 (latest-1,annual-1,monthly-1,weekly-1)
  
C:\Admin>mkdir C:\userdata\virtualbox\centos8

C:\Admin>kopia.exe restore k339a57452d72695f454b6e6036fc0bf3 C:\userdata\virtualbox\centos8
Restoring to local filesystem (C:\userdata\virtualbox\centos8)...
11:45:18.206 [restore] WriteFile CentOS 8.vbox 14437
11:45:18.691 [restore] WriteFile CentOS 8.vbox-prev 15452
11:45:18.723 [restore] WriteFile CentOS.vdi 4350541824
11:48:20.700 [restore] WriteFile Logs/VBox.log 154127
11:48:20.881 [restore] WriteFile Logs/VBox.log.1 175131
11:48:20.979 [restore] WriteFile Logs/VBox.log.2 191261
11:48:21.066 [restore] WriteFile Logs/VBox.log.3 155510
11:48:21.147 [restore] WriteFile Logs/VBoxHardening.log 438735
11:48:21.192 [restore] WriteFile Rear.vdi 5030019072
11:52:35.487 [restore] WriteFile Snapshots/{481d2b80-a6ee-437d-bd13-b96ba2d71359}.vdi 2097152
11:52:36.805 [restore] WriteFile Snapshots/{a699c42d-a1e4-4d61-97ab-706cdb09f01a}.vdi 3578789888
11:54:56.897 [restore] WriteFile Snapshots/{a9c4deb4-a130-479e-b539-e066fc778e7c}.vdi 5847908352
Restored 12 files, 3 directories and 0 symbolic links (18.8 GB)

C:\Admin>kopia.exe snap C:\userdata\virtualbox\centos8
Snapshotting saviodsouza@savio-pc:C:\userdata\virtualbox\centos8 ...
 * 0 hashing, 12 hashed (18.8 GB), 0 cached (0 B), 32 uploaded (604.2 MB), 0 errors, estimated 18.8 GB (100.0%) 0s left
Created snapshot with root k6d60c18ff394e0446497cde3b1728e41 and ID bdaeba93bdd2beaa135aa5b5860d5982 in 11m7s

saviodsouza@savio-pc:C:\userdata\virtualbox\centos8
  2020-09-29 12:00:12 IST k6d60c18ff394e0446497cde3b1728e41 18.8 GB dr-xr-xr-x files:12 dirs:3 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)

C:\Admin>kopia.exe diff k339a57452d72695f454b6e6036fc0bf3 k6d60c18ff394e0446497cde3b1728e41
changed ./CentOS.vdi at 2019-10-03 18:47:36.6555217 +0530 IST (size 4350541824 -> 4350541824)
changed ./Rear.vdi at 2020-07-23 11:24:33.5621439 +0530 IST (size 5030019072 -> 5030019072)
changed ./Snapshots/{481d2b80-a6ee-437d-bd13-b96ba2d71359}.vdi at 2020-07-23 19:27:01.0343014 +0530 IST (size 2097152 -> 2097152)
changed ./Snapshots/{a699c42d-a1e4-4d61-97ab-706cdb09f01a}.vdi at 2020-03-24 19:48:59.1462265 +0530 IST (size 3578789888 -> 3578789888)
changed ./Snapshots/{a9c4deb4-a130-479e-b539-e066fc778e7c}.vdi at 2020-07-23 19:06:33.3103623 +0530 IST (size 5847908352 -> 5847908352)

------------------------------------------------------

kopia policy show --global
Policy for (global):

Retention:
  Annual snapshots:    3           (defined for this target)
  Monthly snapshots:  24           (defined for this target)
  Weekly snapshots:    4           (defined for this target)
  Daily snapshots:     7           (defined for this target)
  Hourly snapshots:   48           (defined for this target)
  Latest snapshots:   10           (defined for this target)

Files policy:
  Ignore cache directories:        true       (default)
  No ignore rules.
  Read ignore rules from files:
    .kopiaignore                   (defined for this target)

Error handling policy:
  Ignore file read errors:       false       (defined for this target)
  Ignore directory read errors:  false       (defined for this target)

Scheduled snapshots:
  None

Compression:
  Compressor: "zstd-best-compression" (defined for this target)
  Compress files regardless of extensions.
  Compress files of all sizes.

Can you do:

kopia show k339a57452d72695f454b6e6036fc0bf3/CentOS.vdi

and

kopia show k6d60c18ff394e0446497cde3b1728e41/CentOS.vdi

Also, if you don’t mind joining slack.kopia.io it would make this whole debugging much easier.

The size and md5 hashes of both the outputs matches…

This must be a bug

Yes, I suspect a splitter bug, can you run the kopia show commands and paste output here?

We analyzed the data on Slack, majority of differences can be attributed to the fact that zstd compression is in use and it does not always guarantee byte-by-byte identical outputs for the same input. Assuming hostA and hostB had different CPU characteristics (number of cores, etc.) that explains most of the difference.

There’s however the difference in splitter output too, which we’re still trying to get to the bottom of: all split points except first 3 are the same, but it appears that first 6291640 bytes got split differently (1685450+2132853+2473337) vs (2097152+2097152+2097336).

Host a and b are just the same in this case…
I had just reformatted my pc to fix some common issues in windows.

If a file in windows is fragmented can that be a reason for splitting issue…
Say for example before I formatted my pc, Centos8.vdi file was not fragmented when I first run the backup.
After I formatted and did kopia restore Centos8.vdi got fragmented.
Again I did a backup now because the file was fragmented on disk the splitting worked different… :shushing_face:

What do you mean by “fragmented”? I am pretty sure, that kopia doesn’t “see” file fragmentation. It simply opens a file and ingests all (logical) blocks… even if a file was fragmented on-disk, this would not be visible to kopia.

Just my 2c, of course.

So, which compressor to use for a multi-type CPU setup, where probably different CPU models all save into the same repo:

root@poseidon:~# kopia benchmark compression
Benchmarking compressor 'gzip-best-speed' (100 x 1048576 bytes)
Benchmarking compressor 's2-parallel-4' (100 x 1048576 bytes)
Benchmarking compressor 'zstd-fastest' (100 x 1048576 bytes)
Benchmarking compressor 'zstd-better-compression' (100 x 1048576 bytes)
Benchmarking compressor 'gzip' (100 x 1048576 bytes)
Benchmarking compressor 's2-better' (100 x 1048576 bytes)
Benchmarking compressor 'zstd-best-compression' (100 x 1048576 bytes)
Benchmarking compressor 'pgzip-best-speed' (100 x 1048576 bytes)
Benchmarking compressor 'pgzip-best-compression' (100 x 1048576 bytes)
Benchmarking compressor 's2-default' (100 x 1048576 bytes)
Benchmarking compressor 'gzip-best-compression' (100 x 1048576 bytes)
Benchmarking compressor 'pgzip' (100 x 1048576 bytes)
Benchmarking compressor 's2-parallel-8' (100 x 1048576 bytes)
Benchmarking compressor 'zstd' (100 x 1048576 bytes)
     Compression                    Compressed Size Throughput
-----------------------------------------------------------------
  0. s2-parallel-8                  35              2 GiB / second
  1. s2-default                     35              1.9 GiB / second
  2. s2-parallel-4                  35              1.9 GiB / second
  3. zstd                           225             1.6 GiB / second
  4. zstd-fastest                   287             1.4 GiB / second
  5. s2-better                      38              1.2 GiB / second
  6. zstd-better-compression        225             1.2 GiB / second
  7. zstd-best-compression          225             1.1 GiB / second
  8. pgzip-best-speed               3689            1 GiB / second
  9. gzip-best-speed                1318            1 GiB / second
 10. pgzip                          3688            1 GiB / second
 11. gzip                           1059            231.7 MiB / second
 12. gzip-best-compression          1059            228.4 MiB / second
 13. pgzip-best-compression         1066            227.6 MiB / second

And what is this, that s2-parallel-8 only has a conpressed size of 35? Seems somewhat spooky to me… Or mabe, the test dataset is inadequate…

1 Like

Good catch, budy!

If you use a large text file, 100 megs or so, a file with some compressibility like an SQL dump, you’ll get more precise results.

Like this: kopia benchmark compression --data-file=testdump.sql --repeat=1