Help using Amazon Glacier Deep Archive with Kopia

Angler · October 12, 2023, 5:04pm

I would like to implement a 3-2-1 backup (sortof) using Amazon Glacier Deep Archive.

I know this is somewhat backwards, but my workflow is like:

Working directory is a hybrid of my laptop + Google Drive. Infrequently (maybe every 6 months or yearly) I’ll mount an external spinning disk, and I’ll mount Google Drive, and use Kopia to snapshot the Google Drive mount to the local storage disk I have.

I am concerned about drive failure after experiencing this in the past. As a result, I would like to take the snapshot and have all the blobs compacted. I’m guessing I would want to run some kind of forceful full maintenance on the repository.
I only keep one snapshot at a time, I mainly use this for compression/deduplication.

Once I have my snapshot stored on my local disk, I’ll sync-to an S3 remote, then shovel that into Glacier Deep Archive.
From reading Amazon docs, Upload and Delete operations are Free.
When next year rolls around, I’ll drop all contents from Glacier and perform a new sync-to.

Is this a reasonable way to use this? I like the peace of mind of having an additional copy in the cloud, but I don’t want to pay the relatively higher expense of keeping something in hot storage like Wasabi or B2. This is long term storage, and I expect to never need it, but I want to have it in case I do. If something happens to my Google Drive and my local disk with my snapshot, I could always recover my files from the deep archive.

Am I missing anything? Or doing something stupid? Will this be more expensive than I think it will be? (I’m seeing this cost as $1/TB/Mo)

Will a full maintenance compact the repo as much as possible before I use sync-to?
Since I would restore the entire snapshot and not just individual files in the event of a problem, should I just tar up the entire repo and rclone it up to S3? This would mean restoring a single file rather than hundreds of thousands of tiny ones.

Need a little guidance here. I apologize in advance if any questions are ignorant.

Mxx · October 12, 2023, 6:11pm

As documented in Using Different Storage Classes | Kopia

Using Archive Storage With Delayed Access

Kopia does not currently support cloud storage that provides delayed access to your files – namely, archive storage such as Amazon Glacier Deep Archive. Do not try it; things will break.

If you really want to use archive storage with Kopia, consider using Google Cloud Storage’s Archive storage class – it is more expensive than Amazon Glacier Deep Archive but still very cheap to store ($0.0012 per GB at the time of this writing) and provides instant access to your files; but remember that, like other archive storage, costs are high for accessing files in Google Cloud Storage’s Archive storage class.

So if you really want to use Deep Archive, you’d have to manually upload files there or some other process outside of Kopia.

How much data would you lose if you were to revert to a 6-month or a year old backup?
How much is the lost data worth to you vs paying for a consistent backup setup?

I find https://coststorage.com/ to be a pretty useful resource to estimate the cost of using different providers. For example, based on my usage pattern B2 is still the most cost-effective provider.

Angler · October 13, 2023, 4:35pm

I read the documentation. I assumed it pertains to normal Kopia use, and not the kopia repository sync-to command, is that right?

If I need to manually upload the files, I could do that with rclone if kopia sync-to cannot handle S3 for some reason. I don’t know why that would be, but if it breaks then…

If I have any new crucial data, I was planning on backing that up to the local repo then doing the sync-to (or use rclone sync since Kopia breaks) after dropping the previous backup. I really only need one snapshot at any given time. This is an ultimate failsafe if both my cloud storage and my physical disk have a problem at the same time.

Thank you for the cost calculator, that was neat but didn’t quite provide the settings I needed. For example, I expect 0 egress for multi-year periods of time.

For my daily usage pattern, my regular backups (not what I’m describing above) are fine on the StorJ Free Tier, and MEGA free tier. (Storj is a different kind of cloud, but uses cryptocurrency as their payout to their node operators which makes the project extremely fragile imo)

With the S3 Deep Glacier idea, this is for content that I could not restore or reproduce by any other means, if it were lost. It doesn’t fit in any free tier and I would like to store it at the lowest cost possible as a compressed/deduplicated copy of the whole thing somewhere I can trust.

NickIAM · October 15, 2023, 11:23am

The sync-to command should work just fine as it doesn’t have to read any of the repo contents, it just syncs the repo on a per-file basis and uploads/deletes files based on timestamp (someone please correct me if i’m wrong). The only thing sync-to does different to Rclone is it will change the directory structure of the repository to be flat when uploading to cloud object storage (like S3, B2).

Deep Archive may be cheap for ongoing costs, but it’s extremely expensive to retrieve data. 1TB will cost ~ $205 to retrieve and download (with a 14 day window to download).

Topic		Replies	Views
How can I find out which file to "request restore" from Amazon S3 Deep Archive Support	14	1009	March 31, 2022
Backup subset of filesystem repo to Amazon S3 General Topics	2	48	September 20, 2024
Kopia failed backup on AWS S3 Glacier Support	9	154	July 18, 2024
Support Amazon S3 Glacier and Deep Archive Feature Requests	1	483	June 29, 2022
Local metadata cache for incremental backup Support	5	1049	October 3, 2021

Help using Amazon Glacier Deep Archive with Kopia

Using Archive Storage With Delayed Access

Related topics