I was a bit surprised when I ran a compression benchmark on a very compressable log file (16 MB) on an older i7 with 2 cores and 4 threads:
Compression Compressed Throughput Allocs Usage
------------------------------------------------------------------------------------------------
0. s2-default 1.3 MiB 2.1 GiB/s 944 18.5 MiB
1. s2-parallel-4 1.3 MiB 2.1 GiB/s 940 18.5 MiB
2. s2-parallel-8 1.3 MiB 1.9 GiB/s 993 29.6 MiB
3. s2-better 1.2 MiB 1.5 GiB/s 937 18.4 MiB
4. pgzip-best-speed 1.7 MiB 644.1 MiB/s 1119 35.9 MiB
5. zstd-fastest 757.8 KiB 575.1 MiB/s 5538 11.1 MiB
6. zstd 673.3 KiB 529.2 MiB/s 2807 20.1 MiB
7. pgzip 1.1 MiB 380.8 MiB/s 1126 36 MiB
8. deflate-best-speed 1.7 MiB 352.2 MiB/s 33 5.6 MiB
9. gzip-best-speed 1.7 MiB 218 MiB/s 39 5.9 MiB
10. deflate-default 1.1 MiB 216.3 MiB/s 32 3.5 MiB
11. zstd-better-compression 535.5 KiB 185.1 MiB/s 2938 38.6 MiB
12. gzip 1 MiB 84.3 MiB/s 37 3.2 MiB
13. pgzip-best-compression 0.9 MiB 60.3 MiB/s 1154 38.1 MiB
14. deflate-best-compression 0.9 MiB 23.5 MiB/s 33 3.5 MiB
15. gzip-best-compression 0.9 MiB 19.3 MiB/s 36 3.2 MiB
pgzip is 4.5 times faster than gzip, but I only have 2 cores. I’d expect a factor 2.5-3 maybe (hyper threading would help a bit). So my guess is that the pgzip implementation is faster per core. Wouldn’t it be better to replace gzip with pgzip with a concurrency of 1?