Is one bit flip enough to crash a Kopia repo? - How to recover from bit rot?

horo · March 14, 2021, 4:21pm

A friend and I are trying out a bunch of backup tools (namely Duplicacy, Restic and Kopia) to find out which one meets our needs best (for personal use).

One test is to simulate the catastrophic scenario of bit rot / data rot / data degradation by corrupting the repository by flipping a bit in the data sections of one of the blob files. On the positive side, Kopia performed really good, is really easy to use and is able to recover all other files, that are not affected by the corruption.

Unfortunately we are not able to “repair” the repo into a state where at least the new snapshots do not rely on the corrupted blob but create a new blob from still existing source data.

Is it possible to tell Kopia to detect and rewrite / replace corrupted or deleted blobs, when the source files are still available?

Besides repository repair we also tried to delete the blob and the associated content object using blob delete and content remove commands. We then deleted cache and created another snapshot in the hope, that it will create a new blob for the picture that was corrupted in the backup. Unfortunately this newly created backup still refers to the corrupted/deleted blob and fails to restore the image as well.

Thanks in advance

jkowalski · March 14, 2021, 4:31pm

There are some missing features here, so it’s not optimal UX yet, but today you can remove the damaged content from the index (using kopia content rm <x>). This will actually corrupt the repository a bit more temporarily, but you should be able to create a snapshot of the file with --force-hash=100 to reupload the original data which will fix it. This is because kopia uses content-addressable storage and same data will result in same content IDs (within the same repository).

There’s probably more streamlined experience here that could be built.

Longer term we’ll probably have some redundancy codes to recover from bitrot automatically.
In the short term, using managed cloud storage or a filesystem which handles bitrot like ZFS/BTRFS and performing regular scrubs is recommended.

jkowalski · March 14, 2021, 4:32pm

(I think the --force-hash=100 was the critical bit here, without it Kopia will simply assume that content already exists based on modification times of the file).

horo · March 14, 2021, 6:50pm

Thank you for the quick reply!

That worked great! In case someone else comes along here, this is what we did in detail:

kopia snapshot verify --all-sources --verify-files-percent=100

gave us the following output:

Found 4 objects, verifying 1, completed 0 objects.
failed on root@osboxes:/home/osboxes/Desktop/source@2021-03-14 13:37:01 EDT/horoo.jpg: error reading object Z385d923526dea28c53a3a23ad9179eb1: unexpected content error: invalid checksum at pb914b2361932491cb17acf7f82d22050 offset 754 length 107799: decrypt: cipher: message authentication failed
kopia: error: encountered 1 errors, try --help

which revealed the content ID: Z385d923526dea28c53a3a23ad9179eb1. We then needed to remove the Z at the beginning and use it to remove the content object:

KOPIA_ADVANCED_COMMANDS=enabled kopia content rm 385d923526dea28c53a3a23ad9179eb1

After that, we created a new snapshot using the --force-hash=100 option:

kopia snapshot create /path/to/source/ --force-hash=100

This recreated the corrupted file and even the first backup could be restored again.

horo · March 14, 2021, 6:50pm

Now there’s another issue: In case the original file has been deleted, it is not possible to repair the repo using the steps above. The only thing that we could think of to get rid of the error messages was deleting all snapshots, that refer to the corrupted data.

As snapshot verify only reveals the date of the latest snapshot that is affected, one would have to delete and verify over and over again until all broken snapshots are removed. For a large repo this would be extreeemly time consuming.

Is it possible to list all snapshots that use a specific content object?

Or is it possible to somehow “hide” the error for future invocations of snapshot verify?

jkowalski · March 14, 2021, 9:56pm

That’s great to hear.

We have this idea to be able to remove references to some objects permanently, but it’s not implemented yet:

PRs as usual are welcome.

Topic		Replies	Views
Kopia repository 'corrupted itself' (unable to add to XXX index blob cache: context canceled) Support	5	1441	February 1, 2023
Help with corrupt repo Support	5	2041	July 3, 2022
Dealing with snapshot corruption Support	2	1177	November 25, 2021
How to fix missing blobs? Support	4	286	January 11, 2025
Bitrot and/or data corruption protection Feature Requests	4	1095	December 11, 2020

Is one bit flip enough to crash a Kopia repo? - How to recover from bit rot?

Related topics