Kopia gzip compression

I was a bit surprised when I ran a compression benchmark on a very compressable log file (16 MB) on an older i7 with 2 cores and 4 threads:

     Compression                Compressed   Throughput   Allocs   Usage
  0. s2-default                 1.3 MiB      2.1 GiB/s    944      18.5 MiB
  1. s2-parallel-4              1.3 MiB      2.1 GiB/s    940      18.5 MiB
  2. s2-parallel-8              1.3 MiB      1.9 GiB/s    993      29.6 MiB
  3. s2-better                  1.2 MiB      1.5 GiB/s    937      18.4 MiB
  4. pgzip-best-speed           1.7 MiB      644.1 MiB/s  1119     35.9 MiB
  5. zstd-fastest               757.8 KiB    575.1 MiB/s  5538     11.1 MiB
  6. zstd                       673.3 KiB    529.2 MiB/s  2807     20.1 MiB
  7. pgzip                      1.1 MiB      380.8 MiB/s  1126     36 MiB
  8. deflate-best-speed         1.7 MiB      352.2 MiB/s  33       5.6 MiB
  9. gzip-best-speed            1.7 MiB      218 MiB/s    39       5.9 MiB
 10. deflate-default            1.1 MiB      216.3 MiB/s  32       3.5 MiB
 11. zstd-better-compression    535.5 KiB    185.1 MiB/s  2938     38.6 MiB
 12. gzip                       1 MiB        84.3 MiB/s   37       3.2 MiB
 13. pgzip-best-compression     0.9 MiB      60.3 MiB/s   1154     38.1 MiB
 14. deflate-best-compression   0.9 MiB      23.5 MiB/s   33       3.5 MiB
 15. gzip-best-compression      0.9 MiB      19.3 MiB/s   36       3.2 MiB

pgzip is 4.5 times faster than gzip, but I only have 2 cores. I’d expect a factor 2.5-3 maybe (hyper threading would help a bit). So my guess is that the pgzip implementation is faster per core. Wouldn’t it be better to replace gzip with pgzip with a concurrency of 1?

Is there actually any case where someone would want gzip at all?
For speed, s2 is faster by a huuuge margin, for perf zstd is better even in zstd-better mode as seen above. The only reason to use gzip would be for low memory usage, but that is mostly moot there days.

Good points. I ended up using zst and I’m happy with the results.

Low memory can be important though when running Kopia on a single board computer with lower memory (mine has 2 GB), but in my case that failed even without compression enabled.

Yeah, I don’t think the buffers used by compression will be the limiting factor in those cases, but number of files in filesystem and golang footprint and so on.