Deduplication and compression

Hello everybody,

I was a little confused, that the compression setting is not specified for the whole repository, but can be specified by path.

I’m wondering now if deduplication still works when backing up paths with compression and paths without compression, when both contain the same files.

Kind regards!

Compression can (in a way) be enabled for the whole repository by specifying compression to be enabled globally like so:

kopia policy set --compression=zstd --global

By default all snapshots inhert from global policy, and therefore, compression will be enabled for all backup items. You can of course override it for specific backup items.

As for the deduplication and compression, a recent change to the repository format introduced as part of V0.9 means that data compression happens after hashing, so it should in theory mean that deduplication shouldn’t be affected by compression at all. So it shouldn’t care whether or not compression was enabled, or what algorithm was used.

Thanks for your reply @stpr!

The linked pull request seems to exactly address my questions!

I wonder though what is meant with

Also since compression will be done after hashing, it has so be done server-side, thus the bandwidth usage and CPU utilization between kopia client to kopia server may change.

I was planning to use an S3 repository and (to my knowledge) no kopia server.

Kopia has a ‘repository server’ feature targeted at multi-user scenario where multiple users share a single repository, but provides some security features to prevent users from seeing each other’s contents. In such a setup, each user does not directly access the underlying files, but through the server program which takes care of authentication. This is not the typically relevant for single user backups, like perhaps you are using, where you have direct access to the filesystem.