I’m currently in the process of re-designing my “backup architecture” and I was hoping to obtain some clarification around this question before I proceed. I think I know the answer, but a confirmation(s) would be helpful!
- Let “DESKTOP” be my desktop computer.
- Let “LAPTOP” be my laptop computer.
- Let “LOCAL COPY” be a locally-networked GlusterFS triple-replicated cluster.
- Let “OFFSITE COPY” be Backblaze B2 (or similar, but that’s what I’m currently using).
In short, here’s the desired use case:
- DESKTOP runs frequent backups, backing up locally to its own filesystem.
- LAPTOP runs frequent backups, backing up locally to its own filesystem.
BOTH DESKTOP and LAPTOP use
sync-toto redundantly copy to the LOCAL COPY “remote” repo.
BOTH DESKTOP and LAPTOP use
sync-toto redundantly copy to the OFFSITE COPY remote repo.
So, in a nutshell, what I’m asking in plain English is: can distinct repositories (in this case, DESKTOP and LAPTOP each have their own local repositories)
sync-to the same “remote” repository/destination?
- Both DESKTOP and LAPTOP run snapshots of
$HOMEusing the same repo hosted on LOCAL COPY (network FS).
LOCAL COPY has
sync-toinvoked on it once-nightly and outputs to B2 (OFFSITE COPY).
Whereas I currently keep most my personal data under
$HOME, I want to start to partition it out and get more granular for (at least) the following reasons:
- Importance/priority: I care about some of the data more than other parts.
- Backup frequency: to the above point, there’s parts of my data that I want to run backups on more frequently (e.g. “hot vs cold”).
- Size: some of my data (e.g. GoPro video) is far larger and would be better suited for different media (e.g. NVMe vs spinning HDD).
- Bandwidth: long story short, in the short-term, I’m capped at 5mbps upload (Comcast grrr), so this imposes significant restrictions on seeding backups over WAN. A fresh backup on a ~500GB working set takes between 1-2 weeks to finish.
Why distinct repos for each of DESKTOP and LAPTOP?
- Currently, my triple-replicated GlusterFS cluster (LOCAL COPY) is a single point of failure. If one of my nodes dies and it goes into read-only mode, I lose the ability to back up for some period of time… and more importantly, with supply chain shortages right now, I might not even be able to find suitable replacements for a reasonable price (using Odroid SBCs that are no longer available). I had a scare recently that really spooked me in this regard.
- I want the ability to continue to back up while completely offline, namely for LAPTOP.
- While LAPTOP can mount LOCAL COPY over WAN via Wireguard VPN, there may be situations where it can only back up to OFFSITE COPY (B2) for one reason or another (e.g. firewall issue). If/when that’s the case, I don’t want to halt my other
When it comes to multiple devices, there’s a “core” set of data (~30GB) that I want to sync between them, 80-90%+ which will be redundant (where the primary differences are dotfiles and installation-specific data).
- That’s to say, it’s largely the same, but there’s also some differences (where content-addressed storage comes into play nicely).
- As for the remainder (~500-600GB presently), it doesn’t necessarily need to travel with me everywhere I go (at least “locally” – I can make it available via a network share or similar), nor does it need to be backed up as frequently.
- However, currently, I’m backing up 100% of my data everywhere, every time, and I’m finding myself wanting more granularity/flexibility.
I sincerely appreciate those that’ve taken the time to read this far and/or chime in, hopefully this all makes sense! Happy to clarify anything that I may have done a poor job of explaining as far as my use case goes!