Backing up several "sections" of my computer: multiple repositories vs single repository?

Hi everyone,

I’ve been meaning to start using Kopia for quite some time now and a lingering question for me has always been regarding how many repositories I should create, given I want to back up several “sections” of my computer.

By “section” I mean I organise my data in a few large directories, like “projects”, “documents”, “music”, “photos”, etc.

So far I’ve been using Crashplan and I have a backup set for each of these, which allows different settings like file filters, times to run the backup process, destination (cloud, local drive…) and a few more.

At first I was thinking I would replicate this with Kopia by having multiple repositories, one per each “section”, on the same destination (e.g. B2, or local external drive) - essentially mimicking the backup sets I had with Crashplan.

But after reading a bit more, I think I now realise Kopia’s repositories are not meant for this but simply as a destination for snapshots, and from a simple perspective, I would have one repository per each destination I want, e.g. one repository in B2 and another in a local external drive, and inside each of these repositories I could create separate snapshots for the various “sections” I want to backup.

Which approach makes more sense for Kopia, single or multiple repositories per “section”? Would it make individual snapshots faster if the repository contains less data (thus being a reason to split the various “sections” into separate repositories)?

Use just a single repository to benefit from deduplication

Yes, that’s how it should be. Thing about kopia’s repositories in the same way as git’s repositories that can track multiple directories.

Use single repo.

If you would use different repositories for different folders, you will lose benefit of deduplication. And no, it shouldn’t be faster if you would use multiple repos

About the deduplication, I understand that but there shouldn’t be any duplication whatsoever since each repository would only contain a single directory (a “section”) from my computer, and no files should be duplicated between those directories. In other words, there’s no reason for a file that should be in the “Music” directory to also be present in the “Projects” or “Documents” directory.

This is not an argument in favour of using multiple repositories, I’m just trying to understand if I’m missing something.

The other advantage I see is that having a single repository per destination makes it much easier to manage, as opposed to having 8 of them.

You thinking about repositories as a folders which is wrong. kopia sees content, not a files/directories, it split content into a chunks and saved in a repository. Some unrelated files may have the same chunks even they are unrelated (Music vs Projects, where Project may have some media files as part of project, also many files have the same heads/tails). Think about repositories as a bus, that can carry people, animals or even soulless objects like a stones. It really doesn’t matter what you put into repository, it just a chunks of nameless data that it should carry. You can put into the same repository data from multiple computers and from the point of kopia’s view, it still chunks of data where kopia knows which data belongs to particular file(and particular computer where it come from) and give it back on your request in the same format as on your computer, where you split “sections” by folders.

Multiple repositories are subjects for either for the purpose of geographically redistributing the same repository or when you using different backends to increase redundancy of single repository. (Local repository copied with sync-to command to external, offsite backup, such as amazon’s S3, blackblaze or simply remote computer over SFTP). In all cases, a single repository can hold data for the whole organization with multiple computers benefiting from deduplication from all sources.

I’m new to Kopia and having the same dilemma. I’m less concerned about deduplication and more about repository performance and stability. I have something like 3TB of data to back up in a few hundreds of thousands of files. Other backup software I’ve used didn’t handle that size very gracefully, and I’m wondering what to expect from Kopia in that regard. It will obviously be a lot easier to have just a single repository but I’m concerned it will choke with all the data or that I loose all of my backups if something goes wrong. In contrast, if I split into multiple repositories then each will be smaller and if one gets irrecoverably corrupted then at least the others won’t be affected.
Will be happy to get the more veteran users’ view on this.

Kopia should have no performance issues with backing up all your files to the same repo. Infact, this is part of the reason why sharding was added to Kopia – to maintain performance for very large repos.

For reference, my ~800GB repo (~100,000 source files) has zero issues.

Now if you are splitting your source files into separate repos to provide some resiliency to corruption, that’s a different story.

Personally, though, if you want resiliency to corruption, why not just have all your source files backed up to multiple different repos? That is much better resiliency than splitting your source files into seperate repos.

2 Likes

Duplicating everything is certainly a better option but it is also more expensive. That said, there doesn’t seem to be an easy way to manage multiple repositories. If I understand correctly then kopiaui only supports one repository, so working with multiple ones will either require launching multiple servers or handling the scheduling of snapshots myself. It also appears that kopia server doesn’t support running as a service on Windows, further complicating things. Do I have it right?

Okay, I figured out how to connect multiple repositories to the kopiaui, which solves most of the management problems so I think I’ll go with that approach. Thank you.

1 Like