Difference between the available splitters

Can some explain, or where I can find and explanation of between the different splitters available?

And how can I interpret this results?

Maybe you find some information in this thread:

Best seems to be the one with the lowest timespan.
Not sure why you see 0ms on all FIXED splitters though.

AFAIU the difference is the algorithm used and the size of the chunks (1M, 2M, 4M and 8M indicating the size).

I’m unaware of any official documentation for the splitter functions in Kopia, but these links may provide some basic understanding of how it works.


My guess is:
algorithm name, compute time, number of chunks, minimum chunk size, …, maximum chunk size

1 Like

FIXED is super quick because given an input data it is trivial to determine where it needs to be split since the output size is, well, fixed, other splitters rely on rolling hash, which is harder to compute (and Rabin/Karp tends to slower than buzhash).

1 Like

I did some experiments with Veeam Backup Files as input (.vbk and .vbi). Regarding the resulting size, I did not find a big difference between Buzhash and Rabin/Karp if the block size is the same.

Here is part of my results:

splitter du -d0 find | wc time real/usr/sys
DYNAMIC-4M-BUZHASH 1188 / 88% 116290 307 / 153 / 20
DYNAMIC-1M-BUZHASH 968 / 72% 97057 323 / 158 / 20
DYNAMIC-1M-RABINKARP 975 / 72% 97541 305 / 179 / 20
restic 975 / 72% 231521 382 / 399 / 42

Input here was 41 VIBs (367 GiB) and 2 VBKs: 487 GiB + 492 GiB
Total size: 1346 GiB
I was especially interested in the deduplication of the two full backup files (VBK).
In this run, the data was already compressed but not encrypted.