Can’t wait to try that out - when will we be able to test that?
You can build from the PR branch if you’re really curious. But given this has probably some new bugs I would recommend waiting (that is why I test on clean repository)
The results for the splitter benchmark are ranked by execution time starting starting with the best (lowest execution time) at the top. The difference between some of them is minimal, thus within some error margin they are equivalent. And different executions on the same inputs and machine may provide slightly different ranks.
The columns are:
- Rank / order
- Splitter name
- Total Execution time, lower is better
- Number of resulting chunks for a fixed / constant input size.
- Distribution of the chunk sizes. (minimum, maximum chunk sizes and in between).
I hope this helps.
Correct.
- These are repository-wide parameters (splitter, hashing and encryption) that are set at creation time.
- The solution is to migrate the repository as you point out.
If these parameter were to be dynamically changed, it would involve rewriting all the contents in the repository, which is equivalent to a migration.
The 35 M/s throughput cap due to the splitter is surprising. It seems to indicate that CPU may be the bottleneck here. The expectation is that I/O (reads for the most part in this case) would be the bottleneck, but that depends on the available CPU, memory and I/O resources.
@budy May I ask what your setup is? Also, are you running kopia in “server” mode?
Hi @julio,
I am testing kopia in a client-server setup, where the clients are Supermicro servers with these two CPUs: E5-2670 v2 @ 2.50GHz (10 cores) and E5-2660 v3 @ 2.60GHz (10 Cores). The latter being in a newer server node. Those are the clients which run kopia. The storage node, which is another server, runs kopia in server mode.
When running on the older Xeon, I am achieving approx. 35 MB/s, on the newer Xeon 47 MB/s when hashing/splitting a single large file.
See https://github.com/kopia/kopia/pull/606 for a PR that improves the single-core throughput of splitters. In my tests I’m seeing throughput increase of about 40% for BUZHASH
BTW, this has been merged and is available in unstable
via RPM and APT repository.
Oh great! Now, I will give it a spin and report back. Thanks!
Yesss… this looks much better already…
Snapshotting root@pandora:/mnt/pve/vmBackup ...
\ 40 hashing, 117 hashed (1.1 TB), 0 cached (0 B), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 55 hashed (1.5 TB), 117 cached (237.8 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 43 hashed (1.6 TB), 172 cached (645.5 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 27 hashed (1.6 TB), 215 cached (883.3 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
/ 40 hashing, 2 hashed (1.5 TB), 242 cached (1 TB), 0 uploaded (0 B), 0 errors rs
The overalll throughput also increased, although I don’t have any exact numbers for a single process. I will see to that after the first snapshot has been completely taken.
Well… here we are again… just after I went away things started to stall…
Snapshotting root@pandora:/mnt/pve/vmBackup ...
\ 40 hashing, 117 hashed (1.1 TB), 0 cached (0 B), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 55 hashed (1.5 TB), 117 cached (237.8 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 43 hashed (1.6 TB), 172 cached (645.5 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 27 hashed (1.6 TB), 215 cached (883.3 GB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
\ 40 hashing, 2 hashed (1.7 TB), 242 cached (1 TB), 0 uploaded (0 B), 0 errors rs
! Saving a checkpoint...
\ 40 hashing, 0 hashed (1.7 TB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
/ 24 hashing, 16 hashed (1.7 TB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
/ 40 hashing, 0 hashed (1.7 TB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
- 40 hashing, 0 hashed (1.7 TB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
- 40 hashing, 0 hashed (1.7 TB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
! Saving a checkpoint...
| 40 hashing, 0 hashed (494.3 GB), 244 cached (1 TB), 0 uploaded (0 B), 0 errors
If you want to try building Kopia from jkowalski/kopia
using checkpointing-fix
branch, you should see this fixed. The PR is not merged yet, but I’d appreciate folks taking it for a spin.
$ git clone https://github.com/jkowalski/kopia
$ cd kopia
$ git checkout checkpointing-fix
$ go install
$ ~/go/bin/kopia snapshot create ...
(Btw this also has splitter performance fix now merged).
Argh… silly me… I totally ignored that the checkpointing fix has not yet been committed… However a quick check of a snapshotting a single file revealed more than a 40% increase. As far as I could check the performance increase was more in the range of 2x, probably even more.
It’s merged now. Can you update Kopia and give it another try?
The fixes are now available in v0.7.0-rc1 release. Can you let me know whether the performance has improved for you?
Eh… I am on a short break until thursday and I won‘t have access to my gear before. Sorry…
Okay, I couldn‘t resist and updated remotely.
remotely.
Preparing to unpack …/kopia_20200913.0.184705_amd64.deb …
Unpacking kopia (20200913.0.184705) over (20200911.0.201201) …
Setting up kopia (20200913.0.184705) …
Then I created a snapshot and now it seems, that Kopia doesn‘t perform any checkpoints at all, although I didn‘t specify any checkpoint interval.
It does checkpoint internally, I removed displaying that since it wasn’t adding much value.
Thought so. Looks good to me!
Works great now:
root@poseidon:~# kopia snapshot /mnt/pve/vmBackupSnapshotting root@pandora:/mnt/pve/vmBackup ...
* 0 hashing, 332 hashed (7.2 TB), 60 cached (32.3 GB), 0 uploaded (0 B), 0 errors 73.4%
Created snapshot with root kee5e98d8ea2b7a6d31848fe58143c0a1 and ID f3b2abd5b4ab3b2037d1c9fdd3fafd64 in 5h8m7s
root@poseidon:~# kopia snapshot list
root@pandora:/mnt/pve/vmBackup
2020-09-15 16:14:24 CEST kee5e98d8ea2b7a6d31848fe58143c0a1 7.3 TB drwxr-xr-x files:392 dirs:23 (latest-1,annual-1,monthly-1,weekly-1,daily-1,hourly-1)