Is Kopia retrieving data periodically?

Hey, thank you for all the great work on Kopia.

I’m using Kopia to backup large amounts of data to S3, we’re talking roughly 11TB a month at the moment. AWS provide the data transfer into S3 for free, however the data transfer out is not free.

Currently, we’re not doing a lot of restores so there isn’t much need for data to be transferred out of AWS. However, the billing from AWS for the outwards data transfer is quite large and we’re roughly transferring out as much data as we’re currently transferring in. I have looked through the Kopia codebase and can’t find anything clear to show data being retrieved periodically or anything. But is there anything I’m missing, is Kopia doing a periodic refresh of data stored on S3? Or something similar?

Apologies if I’m completely barking up the wrong tree, just trying to figure it out.

Cheers

Kopia will generally only download index and metadata blobs (directory listings) and those are cached so will generally need to be downloaded once per connection, unless you’re discarding cache somehow.

Kopia will also periodically list index blobs to determine if there are new blobs to be downloaded.

Can you describe how you’re running Kopia (server mode?, containers? how many computers? are they always connected? are they connecting and disconnecting a lot?)

Thank you for your response @jkowalski.

The cache is limited to 1GB to save space and we’re running Kopia on Ubuntu servers that are always connected. They may reboot sometimes but not frequently. We’re talking 1800+ servers at the moment, we do manually delete old snapshots using the kopia snapshot delete --delete command, I wondered if this needed to download anything to delete a snapshot. Additionally, we’re running the maintenance command, does this do anything regarding downloading data?

You can see what’s going on using —log-level=debug and look for STORAGE. Let me know if you find a pattern that cannot be explained.

Thank you for your suggestion @jkowalski

I think our issue is that we’re limiting the cache size to 1GB so then Kopia is downloading index and metadata blobs. Do you have a suggested cache size for amount of data being backed up? For example how much cache per 1GB of data being stored? I understand that there is many moving variables that could impact this, so if not then no worries at all.

Kopia separates data (file contents) from metadata and operations such as deleting a snapshot only operate on metadata.

The metadata cache stores q blobs which store manifests (snapshots, policies) and directory listings. You can find out how much space the blobs need: kopia blob stats --prefix=q

To adjust, use something like this:

$ kopia cache set --metadata-cache-size-mb=5000