Recover repository from damaged hard drive

Hello everyone, I need help with my setup.
I have a Kopia instance running on my server, and it backs up to an S3 server (MinIO) hosted in a different house. The 500 GB of repository data were saved on a hard drive that is almost dead; it has several bad sectors. Unfortunately, I couldn’t get to that location for 4 months, so I kept doing backups to that repository. Today I managed to replace that hard disk, so I tried to copy everything from the almost-dead disk to a new one using rclone. Some files were corrupted, and the copy failed. After that, I mounted the new disk to the MinIO server and tried to connect with my Kopia instance.
Everything seems to be working, but if I run

kopia maintenance run --full

I get multiple “blob not found” errors, and the process ends with

Total bytes rewritten 1 GB  
Finished full maintenance.  
ERROR error rewriting contents in short packs: failed to rewrite 28 contents

I also tried with:

kopia index recover --parallel=10 --commit --advanced-commands=enabled

but the command gets stuck at 3% of the execution. The command
kopia snapshot fix invalid-files --commit
says that it fixes something, but the other commands are still failing. I also tried clearing the cache - nothing changed.

Given this information, do you think that I can trust the current repository? I don’t mind deleting the repository and creating a new one, since I (think I) have all the original files. I prefer to recover the previous repository because I hope that some snapshots could save me from partial data loss that I haven’t noticed yet.
What do you suggest?

For full disclosure note I’ve not run a snapshot that I’d yet trust as I’m still in the evaluation phase. I’ve never used S3 as a backend. I’m sure it’s just a matter of time before someone with long term experience comes across your post. In the meantime & regardless the backup method/software used: no, I wouldn’t trust it if the old underlying device didn’t remap the bad sectors of the drive & subsequent snapshot uploads to it didn’t validate the pool. The clone operation may have just ended up copying the corrupted blocks to the new storage device.

I had considered such a scenario that you’re now experiencing. I intend to ensure the endpoint file system at least uses some form of fs check summing combined with Kopia’s (stated as experimental) ECC.

Again, before others who may have more to say on the matter post, I’d perhaps place on the ‘back burner’ of the matter at that you might … & I do emphasis might have a basis to try a method to target the failed blobs to remove them in a targeted fashion. See the attached link to a thread. I’d then re-download/verify the pool to fully confirm 1:1 src/endpoint consistency. I suggest this as you’ve mentioned you’re not against just a ‘nuke & reset’ of the endpoint’s repo/pool.

If you’re willing to go that far anyway, you might as well try a surgical strike.

Good luck.

thank you, your answer is very appreciated. I tried the verify process from the guide you posted, unfortunately I had no success because it keep looking for the missing blobs.
I know some blobs are missing, I hope there’s a way to skip them

I’d try to manually remove them in hopes of clearing the known invalid references to those blobs in the repo. See the link above to the ‘1 invalid checksum […]’ thread.

Beyond that I don’t think I’m able to offer any other ideas.

Do you still have the drive, that you fetched from the Minio server? If yes, I’d suggest, that you perform a low-level copy of that drive. I don’t know about rclone’s features, but if it doesn’t perform a block-level copy, I’d use something else, like dd on Linux.

Then, it may be worth to gige Steve Gibson’s SpinRite a try with that drive. SpinRite is no freeware, but it surely has recovered a lot of otherwise faulty drives. You can find it here [  Home of Gibson Research Corporation  ](https://GRC’s PrinRite)

Going forward, I’d “convert” the S3-type repo to a filesystem based. You will need to rename the files, which hold the repos metadata. Those carry a .f suffix, which they don’t when being used on a filesystem repo. Just stripping the suffix of those should be enought. Then have a go at the converted repo with a local Kopia instance and see, if you can connect to the repo.

Now you can run a kopia snapshot verify and a kopia content verify. I’d probably start with a kopia snapshot verify and see, what errors will come up. You should now also be able to run a kopia repository repair filesystem, now that you’ve converted your S3 repo to a filesystem based one.

If you can successfully repair your repo, you could use kopia repo sync-to s3 to sync your repo on your S3 bucket with the repaired local one.