I’m considering using Kopia for my personnal data stored on a laptop. It seems very promising
The initial upload will take months considering my upload bandwidth. Is there a safe way to start uploading/stop/turn the computer off/turn the computer on/resume the upload ?
How are the big files handled ? Do they need to be uploaded in one shot or can they be partially uploaded and completed later on ? How do you recommend me to proceed ?
It would be probably best, if you first performed a local backup using kopia and then run a sync-to operation on the local repository, which uploads it to some online storage. The benefit will be, that blobs in the local repo will be 20MB max., which will effectively chunk your big files and prepare them better for uploading.
I have a hard time figuring out what to do to combine
“The initial upload will take months considering my upload bandwidth.”
with
“Is there a native way to limit the bandwidth ?”
I do appreciate you having a slow link and wanting to send huge amounts of data, but bw-limit is going to work actively against the first backup ever finishing. Even if it takes only weeks to send, it would still mean that the next run will have to contain weeks of changes, which perhaps takes days, then days worth of changes will take hours until you come to a situation where a more normal schedule of snapshots can be run frequently.
So, either run the first run from a place with more bandwidth, and/or use the other suggestion of backing to something local, which can later shuffle it in the background (hopefully with some limit) to an online storage.
The first repo sync to my Wasabi S3 bucket lastet for approx. 6 days at 10 Mbit/s (yes, you can thottle the uplaod). Subsequent updates to the synched repo run for 30 mins. each day. However, there’s no way, I’d backup to the S3 bucket directly, since it’d bee too slow. So always backup to a local repo and sync that to some online storage.
Basically, I want to backup about 400GB to b2 backblaze, including some 10GB files, using a 100KB/s upload bw. The daily change rate will be low (less than 100MB).The files are stored on a laptop, which I use during the day. I need to throttle the upload to keep a decent connection for everyday tasks. I see the backup as a background task that is not supposed to eat all cpu and bw resources.
If I get you right, you’re suggesting to backup my data twice. I understand that it could be convenient for big files that would be split before the upload, so that I would not have to resume every morning from scratch the upload of a big file interrupted the previous evening (edit : I suppose I could decide to let the upload going at night for some days, or suspend the program with Ctrl+z, suspend the laptop to ram, and resume the next morning). But how can this be faster than syncing directly online? Isn’t the amount of data to sync the same, using the same bw ? (How do you throttle the bw using kopia?) I know there’s something that I don’t get.
Once the initial upload from the local repo to b2 has completed, I suppose I can backup directly my daily modifications to b2 without the local repo ?
I’m very grateful for your taking the time to teach me how those things work
Kopia has a time limit in which it expects to finish a snapshot. This timeout is by default somewhat about 40 min.s I think. If a snapshot does not finish in that time, you will not get - from Kopia’s point of view, a complete snapshot, but only an incomplete one.
Have you actually done the math and calculated how long it will take to upload 400G at 100KB/s?. Then there’s the update to the repo… yesterday, I have performed a direct snapshot from Work to my repo at home - that was over a 100 MBit connection and the snapshot took somewhat about 6700s over the internet vs. 350s on the local LAN. Now imagine, how much longer it will take if you’re restricting the bw to 100KB/s.
Kopia snapshots are not only about the actual bandwidth, but much more about latencies, which are by order of magnitudes higher on WAN than on LAN connections. Even performing the same snapshot over LAN or Wifi makes a huuge difference.
From a practical standpoint, your only choice will be a split setup, where you run a local repo and have kopia sync that one to some remote storage…
In fact, it’s much more than 46 days since I only turn my laptop on during the day.
I wish to backup personal data: photos, videos, ancient work… these will hardly ever change. It doesn’t matter to me if the backup lasts for half a year… as long you say it can actually get completed ! In the long run, I’ll only be adding files slowly, maybe move/rename the pictures a bit. (concerning this aspect, a read that borg seems to be efficient since it doesn’t need to reupload a file that’s already backed up but moved/renamed.)
Well, then Kopia is probably not for your use case. Kopia needs a snapshot to finish and turning your host off, will undoubtedly prevent a snapshot from ever finishing.
It should all actually just work. Just will take a very long time. The incomplete snapshot feature is all internal to precisely save you time on reuploading when the connection gets broken.
Cool. So you’re saying it’s not inefficient (with regards to the amount of data to upload) to kill/rerun the backup, even when in the middle of uploading a big file ?
However, you will not get a completed snapshot ever this way. Or it might just be me, but as long as a single run take more than Kopias default timeout, at least KopiaUI doesn’t show me one, if it takes too long…
There’s no such thing as default timeout anymore, we used to have in the past, though. Snapshots are never restarted, instead they are checkpointed during the snapshot (including mid-file).
KopiaUI will not show incomplete snapshots, if I’m not mistaken, but CLI has them and when they complete you should see them in the UI. (There could be other bugs I’m not aware of that prevent this from working, but at least that’s the theory).
Yeah - sorry, I meant the checkpoint timer, which I believe, is at a default of 40 mins. Each snapshot, that takes longer than that - KopiaUI will display a 0-Size in the main screen.
Anyway, despite the technical possibility of running a snapshot over 49+ days - even longer, if you imagine, that this host will be put to sleep or shutdown in the evening, I still don’t think, that you can call this a regular backup. I was just trying to make this clear.
stumbled on this and am curious if @din every finished his/her backup (and wanted to leave some additional thoughts for the next bandwidth-constrained Kopia user).
Also curious why it wasn’t recommended to use filtering, either by file type or directory to (1) ensure a completed snapshot (2) prioritize most important data first.
And what about compression? Why not benchmark the various compression algorithms to figure out what will be the best fit for you? Given no mention of CPU bottlenecks (only bandwidth), might the best option be whatever smashes your data down best? 10GB files are often things like compressed media files which don’t further compress well so you could always flag those as exceptions to avoid unnecessary CPU work, but then again, when you are that constrained for bandwidth, I’d probably just squeeze those blocks as tight as I can before sending them out. Someone with more under-the-hood knowledge should probably chime in here on why I’m coming to the wrong conclusions now
Also, I know this has been said elsewhere, but with that small a pipe, if you really care about that data and need it offsite, I’d bum a friend’s network, or a coffee shop, or library, …or basically anywhere to at least get the important stuff up, and then let the rest trickle up over time.
Maybe the old “sneaker-net” is a better fit? …Have an offsite drive (or two that you keep in rotation) at your work or someone’s house that you bring home to back up over a wire (eg. USB 3 where even a slow spinner will be somewhere around 2000 times faster than your outbound pipe), but it then lives offsite.
And then there are the many strange combinations of all of the above that might be best depending on your specific data situation (rate of data that’s changing over time, correlation between important files and large files, etc.)