Can some explain, or where I can find and explanation of between the different splitters available?
And how can I interpret this results?
Can some explain, or where I can find and explanation of between the different splitters available?
And how can I interpret this results?
Maybe you find some information in this thread:
Best seems to be the one with the lowest timespan.
Not sure why you see 0ms on all FIXED splitters though.
AFAIU the difference is the algorithm used and the size of the chunks (1M, 2M, 4M and 8M indicating the size).
I’m unaware of any official documentation for the splitter functions in Kopia, but these links may provide some basic understanding of how it works.
https://restic.net/blog/2015-09-12/restic-foundation1-cdc/
My guess is:
algorithm name, compute time, number of chunks, minimum chunk size, …, maximum chunk size
FIXED is super quick because given an input data it is trivial to determine where it needs to be split since the output size is, well, fixed, other splitters rely on rolling hash, which is harder to compute (and Rabin/Karp tends to slower than buzhash).
I did some experiments with Veeam Backup Files as input (.vbk
and .vbi
). Regarding the resulting size, I did not find a big difference between Buzhash and Rabin/Karp if the block size is the same.
Here is part of my results:
splitter | du -d0 | find | wc | time real/usr/sys |
---|---|---|---|
DYNAMIC-4M-BUZHASH | 1188 / 88% | 116290 | 307 / 153 / 20 |
DYNAMIC-1M-BUZHASH | 968 / 72% | 97057 | 323 / 158 / 20 |
DYNAMIC-1M-RABINKARP | 975 / 72% | 97541 | 305 / 179 / 20 |
restic | 975 / 72% | 231521 | 382 / 399 / 42 |
Input here was 41 VIBs (367 GiB) and 2 VBKs: 487 GiB + 492 GiB
Total size: 1346 GiB
I was especially interested in the deduplication of the two full backup files (VBK).
In this run, the data was already compressed but not encrypted.