Low read performance per distinct hashing process

Soo… here we go for the 2nd snapshot on that multi-TB KVM dump volume:

root@pandora:/mnt/pve/vmBackup
  2020-09-15 16:14:24 CEST kee5e98d8ea2b7a6d31848fe58143c0a1 7.3 TB drwxr-xr-x files:392 dirs:23 (latest-1,annual-1,monthly-1,weekly-1,daily-1)
root@poseidon:~# kopia snapshot /mnt/pve/vmBackup
Snapshotting root@pandora:/mnt/pve/vmBackup ...
 * 0 hashing, 264 hashed (5.1 TB), 128 cached (2.2 TB), 0 uploaded (0 B), 0 errors 100.0%
Created snapshot with root ke516479e884be8fc980e216497a3b196 and ID 149ff13834eb5dddb182c43d57c2ac27 in 3h24m27s

root@poseidon:~# kopia snapshot list
root@pandora:/mnt/pve/vmBackup
  2020-09-15 16:14:24 CEST kee5e98d8ea2b7a6d31848fe58143c0a1 7.3 TB drwxr-xr-x files:392 dirs:23 (latest-2,daily-2)
  2020-09-18 07:31:35 CEST ke516479e884be8fc980e216497a3b196 7.3 TB drwxr-xr-x files:392 dirs:23 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)

Runtime of 3h:43 vs. first full of 5h:8 is absolutely great!

Can you share the layout of those large files ? I’m curious whether they are in a single directory vs multiple and how those directories are nested?

Currently backing up directories that only have 1-2 very large files is not optimal, because it does not use multiple CPU cores for splitting, but there are ideas for fixing it in the future.

Basically, those are QEMU dump files from our KVM cluster, so rather large files ranging from 12GB up to 300GB. The rest is not even noteworthy…, some logs and supplemental files. 99% are those KVM dumps, though.

As far as I observed the performance, it only seems to drop significantly once less than 8 files are processed in parallel. I am really looking forward to the time, when processing large files more efficiently, will be introduced, but as it is now, it’s already quite fast.

1 Like