How can I find out which file to "request restore" from Amazon S3 Deep Archive

I want to use kopia for backing up into an AWS S3 Deep Archive.
How Do I find out the file names inside the bucket (starting with p), which I have to do a “restore request” for on AWS, if I want to restore directory or file xy from a snapshot from a kopia repository?

If you have created one or more snapshots, you can do kopia snap list (if you are still connected to the repository).
Then do kopia ls -l <id>, where <id> is the id you want to restore from.
Now you see a list of files and/or directories in the snaphost with <id>'s.
Then do kopia restore <id> <new directory name> or kopia restore <id> <new filename> depending on what you want to restore.
If your file or directory is not in the root of the snapshot, you can do a ls -l <id> of the parent directory.

Be really careful with “Deep Archive”. It pretty cheap to archive there for a long term, but any read from it will be expensive and kopia as well any other such backup solutions must have access to repository to be able to manage incremental backup.

Thanks kees for looking at this.
But, what kopia ls -l <id> shows are the object-ids, not filenames inside the bucket.

Also thanks iBackup for taking care of my costs. My question arises out of that very reason: If I want to restore only a part of my backup, I don’t want to pay for a “request restore” for all bucket p files, just because I don’t know which are the needed ones.

In general, I don’t know, why users are essentially discouraged to use Deep Archive for the p files. It’s just a matter of calculation after the price list:
S3 Standard Storage - 0.023 USD/GB/month
S3 Glacier Deep Archive Storage - 0.00099 USD/GB/month
S3 Glacier Deep Archive Data retrieval - 0.02 USD/GB
0.02 USD/GB / (0.023 USD/GB/month - 0.00099 USD/GB/month) = 0.91 month

So if I retrieve my data rarer than every ~4 weeks, it’s cheaper to use Deep Archive.

The very small differences in request costs (few cents per 1000 requests) don’t make a big difference.
Did I miss something relevant? Otherwise that’s the breakeven. It’s roughly the same in every region. Kopia is a backup tool, not a sync tool or data transferral tool. I don’t know which use case this could be, that one needs to restore a backup more often than this. That’s not what I consider a good backup plan. My appartment burns down not more often that once a year. :wink:

But the kopia doc makes me suspicious. Before I file an issue, really: did I miss something? Please look at my consideration and calculation, if there is an error. Thanks!
Otherwise I’d suggest to the kopia developers, instead of generally dissuading users from using Deep Archive, showing how to use it as cost saving as possible, i.e. also documenting how to AWS-“request restore” only relevant parts of an deep archive. Or even make a kopia function to show this, or even to make the retrieval request for me, if I say so.
Don’t get me wrong, I appreciate and find it totally nessecary and correct to show the disadvantages of Deep Archive. But the sentence “Glacier is not recommended for most users to use” I consider not justified.

Ok, I figured out a way:
kopia index inspect --all | grep <object-id from kopia ls -l>
shows as last hash in the row the blob file name of the object. Still, I consider this a very inconvinient workaround. If someone has a better solution, I’d be happy to read it.

Another one: use kopia restore on an object and read the error message, it names the blobs that cannot be retrieved. But the documentation and me prefer the way of restoring by mounting. There the error message is less verbose and doesn’t name the blob.

Try to run strace utility on kopia to figure out how many reads it will do during backup.
kopia isn’t just pushing files to storage. To be able to do deduplication as well be incremental kopia need to read from remote storage and that what I meant in my first post. Request for reading files from deep archive in glacier sometime may took hours due to it storage based on tapes.

Try to run strace utility on kopia to figure out how many reads it will do during backup.

On p-files: Exactly NONE.
Kopia doesn’t read p-files in normal operation, see here. It would break, if it would try to, if they are in deep archive. That’s why it is totally “allowed” by kopia, to put p-files into deep archive, also see here.
On the one hand, it is not nessecary, on the other hand, it is impossible, if they are in a deep archive. It’s not a matter of “hours” to read them, it’s just impossible, without manually making a restore request on AWS before, which kopia doesn’t do. Sadly, not even when I want to.

Maybe I must be more clear, of course I totally go by the documentation and put only p-files into deep archive, I never meant any other files.

Maybe I must be more clear, of course I totally go by the documentation and put only p-files into deep archive, I never meant any other files.

This make totally different sense for sure !

Well, I believe then you found solution. Im not sure tho how to automate it since restoration is usually manual process where you have to know upfront what you need to restore.
Or do you mean that it is inconvenient way to find p-blocks?

Or do you mean that it is inconvenient way to find p-blocks?

Yes, as written in the first post: :wink:

How Do I find out the file names inside the bucket (starting with p ), which I have to do a “restore request” for

I think, this is quite stupid: kopia “knows” the relevant files, already writes in the docs, what to do with them (“request retrieval”), but makes it unnessecary hard for a user to do so.
And plus giving bad advice to not use deep archive, although for the most users this will be the far cheaper way, with only few disadvantages, which even become bigger, because kopia has no good support for this use case.
I think, it should be that way:

  • kopia recommends deep archive for p-files, of course still telling the disadvantages in the docs (longer retrieval time and retrieval costs), maybe even mentioning the breakeven of ~1 month.
  • kopia offers me a flag for the restore command to automatically make the aprobiate retrieval request on AWS
  • kopia offers me a means of automatic retrieval requests also for the mount way of restoring.

As it is, it looks as if the developers only recommend not to use deep archive, because they haven’t implemented the things You would need for a straight forward way to restore from this yet. Yet they give cost reasons in the docs, which are mostly not true.

While the proposed changes sound fine if someone can take the time to implement them, I think there is some hidden assumptions here that may or may not be fruitful.

While deep is cheapest, I think the idea from the developer side was that if you use it, you should be well aware of its limitations and use it when it is suited for your use case. When this use case contains a lot of restores and retrievals, the deep is not “best” anymore. As with many other parts of life, things don’t go from bad to best along one single axis so that everyone can just slide their solutions towards best and the forget about it, but rather it involves a lot of factors which in this case price is of course one factor, but cost of listing another, speed of retrieval another, complexity in setup yet another and so on.

If the current recommendation is “best to avoid deep” then this might just reflect the point that while the price tag looks fine on that kind of storage, the other factors as they are now make it less attractive and hence recommendations against so that people don’t get nasty surprises.

If you DO know that you will very very seldom do any kind of restores, then it could work out, but as soon as your usage does not fit that model, then it is less optimal and having data in less remote kinds of storage is again the better choice. This probably goes for all kinds of backup programs and S3 deep.

See calculation above. Do You really consider rarer than once a month “very very seldom”?
As said, I have nothing against listing the other disadvantages (some of which could be mitigated by making kopia ease the process of retrieval), but costs are none of them for most users. Or for all users with a senseful backup plan, and a good understanding what kopia is for.

It’s not interesting what I consider very very seldom, but what the person choosing Deep over some other choice.

If my boss says we need to keep financial data for 7 or 10 years in the cheapest way possible, and only retrieve it in case we get a full scale audit, then that restore would be worth a lot of hassle and occur once or never and Deep would be a very good fit for that kind of data. For the daily stuff I use and need, I need to do restore tests to know my data is usable and safe and hence I would make other choices. This is what I experience is the kind of choice all S3 users need to evaluate when selecting which tier to use, kopia or not.

But, if someone makes the effort to turn this into code, all the better. Just trying to state that its not because devs are evil or anything that the recommendations might be against the slowest but cheapest version, just because so few have the actual use case where Deep is a perfect fit.

Yes, and a general (bad) recomendation is exactly against this. Then one shouldn’t make a recommendation at all.

I don’t think so. Repeating doesn’t help. I think, Your example of 7 or 10 years shows, that You didn’t follow the calculation. We are takling about over 1 month, where deep archive is already cheaper.

The use case I suppose for most of the users is not broken by some test data. Do You really pull back most of Your data more often than this? Then, I think, You are an unusual user, and You probably know what You’re doing and could easily go over any recommendation. But exactly this is probably not the usual case for an usual user, which I expect to be a general recommendation made for.

The doc says, it is “very expensive to retrieve data, as they are designed for long-term archive”. I just want to say: the so-called “long-term” starts at only one month, if it’s about the costs. Which is not long at all for a backup. That’s why I consider the documentation misleading here. Mentioning this break-even there would help a lot of users to make the right descision for them.

Topic closed for me.

Yes, as written in the first post

Well, sorry that Im no a telepathic :slight_smile: to guess that you going to save only p-files

kopia recommends deep archive for p-files, of course still telling the disadvantages in the docs (longer retrieval time and retrieval costs), maybe even mentioning the breakeven of ~1 month.

Don’t you think that there also should be described a weird procedure how to split archive, to keep separately p and other blocks ? “Where and how” to keep the rest of backup files as well glue them together on restoration, it - might be very different and IMHO such workflow might go wrong.

kopia offers me a means of automatic retrieval requests also for the mount way of restoring.

What you asking, - would take a decent amount of dev’s personal time, so “offer me” kinda sounds as request they obligated you. Are they ?

Personally, I won’t to step in for such solution where archive is split, where most of it saved on glacier and rest of it somewhere else… but hey, kopia already gave you such possibility to do it manually. It’s open source project, if you think it useful for you and may benefit for somebody else then why not to fork the project, add solution you proposed, and do a pull request, so we all together will make it better and better ? Or at least make a polite “feature request” in case you don’t know GoLang…

If the only backup you going to keep - in an cloud, then it IMHO very wrong, you should still to have some local, full, non split backup.

There is much simply solution: You do a “normal” backup to a local storage and then sync it with rclone that supports deep archive with rclone copy --immutable( –immutable (!!!) flag and not a sync command, unless you want to pay according to retention policy) . This way one would still have non splited backup locally and might restore it fast(!!!) and still have possibility for restore backup in case of armageddon from amazon. And in the then end you would still benefit from deduplication since rclone won’t to push identical files.