Split and then merge repository?

We Australians have some strange internet offerings. Specifically, our upload speeds vastly differ from download. In my case, I have a 1Gbps/50Mbit plan.

So uploading multiple TB is a painful proposition.

I’ve uploaded a large amount to a remote repository, and would like to add significantly more. I’m considering imaging on a local HDD and sending it to the storage provider to sync.

My question is thus. I know there is an ability to sync a repository, but how do you ship a subset of a repository and then sync the whole thing once it is copied over?

Us Americans that have the unfortunate experience of having cable internet have the same issue with terrible upload speeds. It is legit frustrating to pay for massive download speeds that we never use but to have abysmal upload speeds.

It may benefit your inquiry if you clarify – are you trying to upload from one source to one destination repo, or are you trying to upload from one source to one destination repo and then syncing that repo to other repos?

For what it is worth, my understanding of Kopia is that it easily supports multiple sources to the same repo due to its deduplication; it does not necessarily care where the blobs come from but rather that they are there. What you are suggesting is similar in spirit, so I do not think you will have any issue; after your cloud provider has uploaded the data from your HDD, run Kopia on your source and Kopia will check whether there are blobs associated for files in your source and, if not, upload them. If the blobs are already there, then Kopia should not upload again, because of deduplication.

Make sure to use the exact same type of encryption and encryption password when doing this, otherwise Kopia will not be able to see the blobs as the same.

If you go down this route, make sure to run snapshot verify --verify-files-percent=100 --file-parallelism=100 --parallel=100 at least once after Kopia is done with your snapshots. This will ensure your backed up files are valid. You can change the parallel values to whatever suits your cloud provider and setup.

What kind of storage is that? Cloud-based (s3/gcs/azure/b2) or file-based (filesystem, sftp, webdav, rclone)?

What I think would work is the following sequence (I’m assuming it’s a cloud repository):

  1. Sync your current cloud repository to a filesystem repository.
  2. Disconnect client from the cloud repository and connect to the filesystem repository instead.
  3. Take a snapshot
  4. Disconnect from the filesystem repository
  5. Copy filesystem repository to HDD and ship to cloud provider
  6. Open repository in the cloud and sync data to the cloud repository - this should be incremental assuming timestamps are preserved along the way.
  7. On the client reconnect to the cloud repository

It is important that between 2 and 7 there are no writes to cloud repository, otherwise sync in step 6 will mess it up - sync-to does not really support merging repositories.

1 Like

Thanks very much to you both for your responses.

I appreciate what I’m trying to do is probably very niche and Kopia won’t have been designed for my specific requirement, but I will give a little more detail.

My repository is on B2, and on this I have several snapshot sets. One for documents, another for photos, filesystem backups, SQL backups, Docker backups, etc. I’d like to add another snapshot for home videos which will be very large.

I am not looking to ship the other snapshots on the HDD, only the home movies component.

Based on the advice from @jkowalski , it sounds like I need to sync all snapshot sets to a filesystem repo, and suspend backups between that sync and when the HDD arrives at the cloud provider. Is that correct?

Wouldn’t it be simply if you would use separate repositories, one “for documents, another for photos, filesystem backups, SQL backups, Docker backups, etc.” and another for videos. In this case you can still continue backup non video content and create second local repository for videos only on external hard drive, then send it to cloud and reconnect to it for further backups ?

All you have to do in this case, is to use separate config files when you run kopia so it can distinct repositories. One setup for regular backup and another for video.

Wouldn’t it be simply if you would use separate repositories

Good point. I had gone so far down a line of thinking that I had missed something so obvious, simple and pragmatic.

Thanks for the advice.