Compression questions

I have a few questions regarding Kopia compression, which I cannot find in the documentation:

  1. I have enabled the compression policy a few days after my first snapshot. Did Kopia delete all uncompressed data and reuploaded them compressed? Or is it waiting for a change in a file before uploading the new version in compressed form?
  2. How can I check how much am I saving with compression? I have tried with a new folder (to avoid any issues related to question 1), and when creating the snapshot it says “10 hashed (1.4 GB), 8 uploaded (90.6 MB), estimated 1.4 GB”. I assume it means that the total 1.4gb snapshot got compressed to 90mb (amazing!), but when listing snapshots it still says that the snapshot is 1.4gb big. Is there a way to get more infos about this for a given snapshot?
  3. What is the unit for the min/max file size compression policy? Bytes? What are the tradeoffs for lowering the minimum? I imagine a 2-3 megabytes could be an appropriate value, but I don’t have many clues. What would be some use cases for the max file size?

Thanks in advance for your time and for the attention, and pardon me for the newbie questions.

I don’ think, that Kopia will change any blob, after you enabled compression. When you think about that, it makes sense… since multiple clients could use the same repo and some of them might use compression and some not. Better to setup the repo and first of all set all the policies you want…

So, I have checked the documentation and I couldn’t find any way to force kopia to replace every blob with an encrypted one. If you’re connecting to a local repo, you can set the global policy to perform compression, before creating your first snapshot.

However, if your’re connectig to a Kopia server, then things are a bit different, since the local Kopia instance is not able to read the global policy and thus you will have to run a snapshot and immediately stop it again. Then you can setup the policy for that target to perform e.g. compression. There is a Github issue open for this bug.

But what happens if I

  1. Create uncompressed snapshot, and retained 4 versions.
  2. Enable compression and expire all 3 older versions. So there is still 1 uncompressed version.
  3. Keep creating new versions naturally.

Will the blobs in the new versions be compressed, like per-blob basis?

At least, this is what I’d expect to happen. Since Kopia is a non-persistent process - Kopia server excluded of course, Kopia will read the policy all over, when it performs a snapshot. However, I’d also expect Kopia in server mode checking the policy when performing a snapshot.

I do have two Kopie repos - one on a NVMe USB3 drive and one served by a Kopia server on my network. Since the local Kopia instance running agains a local filesystem repo will honor the global policy I am pretty sure that the configured ztsd compression is in place, since it shrinked my 768G down to 466G.

When I configured my remote repo, I first started a regular snapshot which I immediately canceled again, to be able to set a policy for that one and afterwards I had Kopia perform a snapshot with zsdt enabled, which resulted in almost the same size on my remote storage. It should have been way more, if compression would not have been performed.

Ok, thanks. If I understand it well, as long as a file is not modified, Kopia will not compress it, right?
Do you have any insight about question 2 and 3?

I don’t know, what you mean by “uncompressed snapshot”… sure, you can change your compression settings, but Kopia will of course refuse to upload any data, which is already in the repo and in active use. Performing a new snapshot will only upload changed data and will “link” the existing blobs in. However, these blobs are still uncompressed and they will stay that way.

As long as you don’t have a massive turnover of data each day, you will likely never replace these uncompressed ones. If we’re talking about just a few GB, than it’s surely way easier to re-setup your repo, perform a very small snapshot, so you can create your compression policy and then perform a regular initial snapshot of all the data.