Is anybody using LZ4 compression algorithm?

LZ4 algorithm has been deprecated for quite a while and we’re considering removing it from Kopia 0.17.0 (to remove the number of dependencies and tighten our OSS supply chain posture, given recent backdoor in XZ compression library and growing concerns of the OSS ecosystem security as a whole): We’re not expecting a significant number of folks to use LZ4 because it had relatively bad performance compared to others.

Please let us know if you’re using LZ4 at scale where removal would cause trouble. If we end up doing it, folks affected would need to use kopia 0.16.1 to migrate their repo to a different compression format before upgrading to v0.17


AFAIK xz isn’t related to lz4 and xz backdoor itself wasn’t sleep in most popular production operations systems. lz4 is still default compression in ZFS in 2.1.x and while it isn’t most performant, it is still one of lightweight. Also, as far as I know GoLang’s lz4 isn’t related at all to recently found attack to xz due to it’s implementation is in pure Go. Anyway, xz attack wasn’t related directly to compression but infrastructure.

Removing protocols/algorithms breaks backward compatibility while keeping it, doesn’t make way too much overhead. lz4 is not outdated legacy and killing it just because of unrelated event, IMO is overkill. It is already implemented, so why to kill it? I know two industrial places where kopia uses lz4 since it shows there optimal benchmarks on limited hardware, so I have no clue how they would be happy if they won’t be able to restore their archives.

If it isn’t broken and isn’t vulnerable, I can’t see any reasons to kill lz4

Just my 2 cents

You’re right. I wasn’t trying to imply this particular algorithm or implementation has security issues, just trying to see whether we need to continue supporting it.

If you look at performance, it’s not really that fast, allocates a lot of memory compared to others and it’s been deprecated since almost 2 years ago.

     Compression                Compressed   Throughput   Allocs   Memory Usage
  0. zstd-fastest               18.4 MB      6.8 GB/s     812      90.1 MB
  1. zstd-better-compression    16.4 MB      5.7 GB/s     780      158.8 MB
  2. zstd-best-compression      16.2 MB      5.7 GB/s     791      159.2 MB (deprecated)
  3. zstd                       18.1 MB      4.6 GB/s     729      157.9 MB
  4. deflate-best-compression   18.4 MB      3.7 GB/s     864      650.6 KB
  5. deflate-default            19.7 MB      3.5 GB/s     944      644.2 KB
  6. deflate-best-speed         20.6 MB      3.4 GB/s     496      641.4 KB
  7. pgzip-best-compression     18.4 MB      3 GB/s       3000     67.9 MB
  8. pgzip                      19.7 MB      3 GB/s       3332     67.9 MB
  9. pgzip-best-speed           20.6 MB      2.8 GB/s     2861     67.9 MB
 10. s2-default                 35.1 MB      1.8 GB/s     122      4.3 GB
 11. gzip-best-speed            21.6 MB      1.7 GB/s     137390   4.3 GB
 12. s2-better                  33.7 MB      1.5 GB/s     125      4.3 GB
 13. lz4                        33.9 MB      1.4 GB/s     108      4.4 GB (deprecated)
 14. s2-parallel-4              35.1 MB      1.2 GB/s     110      4.3 GB
 15. s2-parallel-8              35.1 MB      1.1 GB/s     126      4.3 GB
 16. gzip-best-compression      18.5 MB      1 GB/s       149055   4.3 GB
 17. gzip                       19.3 MB      683.2 MB/s   142871   4.3 GB

BTW. Another option would be to keep it as opt-in feature when building Kopia, this way folks who rely on it would have a choice of including it, but official binaries would not.

If it would be me, I would support it as long as ZFS will do it.

I know, as about me I do in most cases s2-default, but take a look (ssh-ed to one pretty old internal email server and tried it on one email with attachments)

     Compression                Compressed   Throughput   Allocs   Usage
  0. s2-parallel-4              106.9 KB     626.3 MB/s   2644     2.4 MB, --compression=s2-parallel-4
  1. s2-default                 106.9 KB     623.2 MB/s   2677     2.4 MB, --compression=s2-default
  2. s2-parallel-8              106.9 KB     620.4 MB/s   2586     2.4 MB, --compression=s2-parallel-8
  3. s2-better                  100.9 KB     379.4 MB/s   2607     2.4 MB, --compression=s2-better
  4. zstd-fastest               80.7 KB      321.1 MB/s   7423     10.2 MB, --compression=zstd-fastest
  5. zstd                       77.6 KB      223.6 MB/s   4800     20.3 MB, --compression=zstd
  6. deflate-best-speed         110.8 KB     217.5 MB/s   29       1.1 MB, --compression=deflate-best-speed
  7. lz4                        111.8 KB     182.7 MB/s   1929     3.6 GB (deprecated), --compression=lz4
  8. zstd-better-compression    72.2 KB      154.3 MB/s   4866     23 MB, --compression=zstd-better-compression
  9. gzip-best-speed            114.1 KB     150.1 MB/s   35       1.5 MB, --compression=gzip-best-speed
 10. deflate-default            86.3 KB      139.5 MB/s   29       1.3 MB, --compression=deflate-default
 11. pgzip-best-speed           110.8 KB     120.9 MB/s   10536    1.1 GB, --compression=pgzip-best-speed
 12. pgzip                      86.3 KB      96.8 MB/s    10187    1.1 GB, --compression=pgzip
 13. gzip                       82.5 KB      67 MB/s      33       1.1 MB, --compression=gzip
 14. gzip-best-compression      82.3 KB      58.6 MB/s    33       1.1 MB, --compression=gzip-best-compression
 15. deflate-best-compression   82.1 KB      56.1 MB/s    30       1.4 MB, --compression=deflate-best-compression
 16. pgzip-best-compression     82.1 KB      46.1 MB/s    10470    1.2 GB, --compression=pgzip-best-compression
 17. zstd-best-compression      68.5 KB      35.2 MB/s    4829     54.6 MB (deprecated), --compression=zstd-best-compression

lz4 defiantly isn’t great , but isn’t worst too. I don’t use lz4 for sure, but my point is, - if it isn’t broken or vulnerable and isn’t abandoned (last change in lz4 was just 5 months ago), then why to get rid of it? It doesn’t take a lot of space and is not requires a special handling to support in compare to a bunch of clouds.
IMO in case of backup software - “then more than better” as far as it doesn’t creates a problems

At least it would be much better than just drop support with no reason and make some1 rebuild backup from scratch.

I wish such official future would exists to get rid of unused clouds (like Azure, GCP, AWS…) since those makes kopia really fat while most of the time the only one out of premises cloud is in use.

I stand with @iBackup comments. It is not good idea to remove it. Not sure where information about LZ4 being deprecated is from. It is still in use - not only as a default in ZFS.

To make sure it’s clear what I mean:

the algorithm is deprecated within Kopia not anywhere else, and it has to do with particular implementation we’re using, not the format itself.

Kopia does currently not prevent you from using lz4 (we only hide it when you run benchmark by default), we just recommend folks don’t use it for new repos.

I would not mind if it goes. s2 is clearly the better speed-compressor and zstd handles where ratio is important.

I’d like to point out that the mentioned problems (high memory allocation and low throughput) are likely due to the pure-Go implementation.

The original lz4 (made by the same author as zstd), remains the highest-throughput and lowest-memory compressor on the planet (to my knowledge), see e.g. official README.

As shown there, lz4 compresses 1.5x as fast and decompresses 4x as fast than zstd -1.

zstd does not not replace lz4; they are different compressors for different tasks.

I use lz4 over zstd on all fast storage systems, and it makes sense to use zstd as a default on-the-fly compressor for e.g. ZFS or Ceph, because it has so low CPU and memory overhead that there is almost no situation in which LZ4 does NOT make sense.

That said, I would not use the pure Go lz4 in Kopia if its implementation is slower than zstd. But I likely would use lz4 in Kopia if it was as fast as the original lz4.

I downloaded the Silesia Corpus files and ran my own test. LZ4 and Zstandard both with -1 compression level were equally fast (0.4 vs 0.46 seconds), but Zstandard’s compression ratio (97 vs 71 MB) was a lot better. zstd run with --fast=3 was even faster (0.38 seconds) than lz4 while preserving a better compression ratio (91 MB).

The results in the zstd benchmark show similiar numbers as the lz4 benchmark, but I can’t confirm this with my tests. Maybe it’s because of the lzbench tool they are using? Have you ever tested this yourself?

LZ4 doesn’t make sense if you care about compression ratio but it is awesome if you need fast decompression.