Does Kopia makes a mess on one's cloud?

Hi all

I have seven Kopia ‘policies’. The backups resulting from those policies are stored encrypted within a ‘b2’ ‘bucket’ (b2 being a cloud provider). A look at the relevant b2 bucket, via b2’s website, reveals dozens of items that look like backups. (The items at issue have nonsensical i.e. encrypted names. Other files look like Kopia house-keeping files and have readable names. There are only a few of those files.) Possibly some of those backup files owe to snapshot policies that I have since abandoned. (How can I tell?) I suspect though that Kopia is storing everything - every incremental backup for each ‘policy’ - in the root of my ‘bucket’. That is messy.

Can someone shed any light? That is: (1) does Kopia indeed store incremental backups for every policy all in the same directory? If it does do that, (2) why does it do that? Also: (3) if some of the mess that I see owes to backups from abandoned policies, then how to I identify and delete those backups?

Thanks.

This particular storage format is what allows Kopia to create encrypted, deduplicated snapshots with certain security properties (like not revealing file names, file sizes, etc).

The file names have very important meaning to Kopia, for example parts of file names are file hashes which ensure integrity, other parts identify sessions or help us resolve conflicts where multiple writes occur at the same time, and so on.

This may look messy, but I don’t see how this is a problem, since cloud storage buckets (S3, GCS, Azure Blob, B2) are generally not meant to be consumed by humans but by applications and systems using those for data exchange.

You can probably find some backup tools that produce simple, flat file structures with clean names - there must be plenty of those, but i’m sure they will lose some properties that are provided by Kopia.

In Kopia you manage backup repositories not by looking at files in a bucket but through tools provided by Kopia itself (snapshot, manifest commands + plus plethora of low-level tools plus the UI). Periodic maintenance will generally take care of removing data that you don’t need to be storing in the repository.

Yes, it’s a complex system with certain properties that may not look pretty, but that’s a trade-off we’re making to provide those cool features, that in many other application require managing special servers. Instead kopia can organize multi-user repository using only files in a bucket, and that’s possible in part because they are sort-of ugly.

3 Likes

Hmm. Just why is that Kopia cannot create one directory per snapshot?

For the reasons provided above. All the cool features of Kopia.

Maybe try thinking about it the other way around. Why are you trying to look at your snapshots on your storage instead of using kopia? What benefit does that give you?

Also, what is it that you are trying to achieve? What would be the point in having one snapshot per folder? (How would you use deduplication like this? You would end up with partial stuff mixed and matched all over the place, a worse mess in my opinion)

Finally, if this is really something that you want, a great solution for that is rclone. You will lose almost all of the tremendous features provided by kopia, such as deduplication and the ability to snapshot, but you will have your individual folders. You can still use light encryption and I think they have experimental compression.

Just why is that Kopia cannot create one directory per snapshot?

Because data portions from a file in directory A may be duplicate to a file in directory B, and Kopia only keeps one copy of that data (to save space). This is called deduplication. The flat structure makes most of the sense with this approach.

2 Likes

The reason for having a flat file structure on cloud buckets is simply, that cloud storage was not invented for hirarchical storage. The handling of folders woule not be efficient. You can optionally use a bucket prefix, aka a folder for the entire repository if you want to store something else besides the Kopia repo, but I never do that. I am running one repo in one bucket.

2 Likes