Large file deduplication not working - KVM

Hi there,

I successfully made a kopia repository backup up several of my servers. This works great and I am delighted to see deduplication working.

  2024-05-16 06:00:05 CEST kcbc37abc92a777ab5b892f076a634a7a 9.2 GB drwxr-xr-x  files:207298 dirs:29221 new-data:6 GB new-files:146101 new-dirs:24909 compression:0% (latest-4,daily-4)
  2024-05-17 06:00:16 CEST k3bef9ff4968372c89a194ae6aa3f5a88 9.2 GB drwxr-xr-x  files:217613 dirs:29227 new-data:660.1 MB new-files:19761 new-dirs:2491 compression:0% (latest-3,hourly-3,daily-3)
  2024-05-18 06:00:06 CEST k0b7009e35cd32eebf33b6460885cd9fe 9.3 GB drwxr-xr-x  files:218032 dirs:29241 new-data:24.1 MB new-files:704 new-dirs:199 compression:0% (latest-2,hourly-2,daily-2)
  2024-05-19 06:00:06 CEST kfc4b0ee3c31dee9f7557b99174f3c803 9.3 GB drwxr-xr-x  files:218109 dirs:29241 new-data:22.7 MB new-files:289 new-dirs:65 compression:0% (latest-1,hourly-1,daily-1,weekly-1,monthly-1,annual-1)

However I am backup up KVM vm using the Proxmox Hypervisor, no compression enabled and no encryption inside of the virtual machines.
Screenshot 2024-05-19 at 21.10.34

Here when I back up the files I got no deduplication at all. Data inside of virtual machines is not changing much (some logs and databases writes that is it…)
Note that first backup was very different from the others as it contained legacy (now deleted backups).

  2024-05-16 10:02:25 CEST k1a353f981bad64ea3ef43dbd6500a56e 248 GB drwxr-xr-x  files:116 dirs:1 new-data:247.9 GB new-files:98 new-dirs:1 compression:0.0% (latest-4,daily-4) pins:1stSnapAndOldVms
  2024-05-17 07:17:53 CEST k33d5c04549dab3fe22e81a65afef57c0 178.1 GB drwxr-xr-x  files:39 dirs:1 new-data:178.1 GB new-files:26 new-dirs:1 compression:0.0% (latest-3,hourly-3,daily-3)
  2024-05-18 04:00:30 CEST k6683d8c9748507c5b9a0430626eb5dc4 178.8 GB drwxr-xr-x  files:39 dirs:1 new-data:178.8 GB new-files:26 new-dirs:1 compression:0.0% (latest-2,hourly-2,daily-2)
  2024-05-19 04:00:30 CEST k1fe81be093e9e5f458a5f1490c4c5676 181.9 GB drwxr-xr-x  files:39 dirs:1 new-data:181.9 GB new-files:26 new-dirs:1 compression:0.0% (latest-1,hourly-1,daily-1,weekly-1,monthly-1,annual-1)

Repository is mostly on default setting…

Could someone experienced with rolling hash deduplication / KVM help me to achieve deduplication on this repository?


You are not backing up raw VM disks’ images (they would deduplicate very well) but some files created by PVE backup - and they clearly can not be deduplicated.

I think PVE is using VMA format for its backup - and in a nutshell it uses 65536 bytes internal “chunks”, written out of order and each with its own header which for example contains header number. It means that slightest change in VM most likely generates completely different file and such granular changes make deduplication impossible. Kopia algorithm is searching for much larger unchanged parts (MB size).

Kopia by default is using DYNAMIC-4M-BUZHASH algorithm which looks for 4MB chunks. You could change it to DYNAMIC-1M-BUZHASH but it won’t help you anyway. These are typical values used by other cloud backup software (not just kopia). Making chunks much smaller (KB size) would require much larger overhead in managing repo metadata - for example changing chunk to 1KB would require 1000 times more RAM than 1MB chunk. Overall it is trade off designed to be realistic for normal use cases.

You have to either not use PVE backup format and create your own which will store raw VM disk images or use software aware of underlying VMs disks structure - like PBS (Proxmos backup)

Wow, thank you very much for this exhausive answer. Very helpful to understand how this works under the hood.
Will work on that and give feedback. Yeah Proxmox backup is the easiest way to do that, but I would like to rely on one solution…

1 Like

Well in this case, I’d surely go with the tool, which is best suited for the job - and in this case, this is definitively PBS. I am using Kopia for anything else, but not for Proxmox backups.

Hi there, giving a little feedback. I tried numerous ways of doing this without success, including backuping the ZFS snapshot created by Proxmox.

Next step was to take a backup with virsh but proxmox does not seem to be compatible with virsh.
Due to time constraints, I finally installed Proxmox Backup Server. Shame on me.


Naah, don’t worry - PBS is an awesome tool for backing up your VMs and LXCs! My PBS achieves a dedupe-rate in the 20x range. There’s never any shame for choosing the right tool fir the job.