Trusting your backup

TL;DR – I backup my local files… how do I know when/if my local files have gone bad?

I’m new to Kopia. And just today, through testing Kopia, I found a 1 year-old JPG that was corrupted on my server. I was able to retrieve it from an existing backup. But it made me realize that all backup solutions do not protect against corruption. None of them, even Kopia.

I’m a Windows guy and I’ve used NTFS since the beginning of it. I’ve used Windows ReFS for a number of years and I simply cannot trust it. The only local file system to trust is ZFS.

Beyond your local file system. How do I know my backup source is clean? Today, I’m thinking, I need a ZFS-like scrub.

I plan to run a periodic (maybe monthly) compare of each file in my local server to my backup. To confirm the files actually match either in binary or preferably a hash.

What is the point if the files I backup slowly bitrot and I don’t even realize it. And all backup solutions simply backup what they find.

It’s late so this is not exactly a well flowing thought but I hope it explains my point.

So at this point, I’ll now need to restore my various backups to confirm my local files are not corrupted. I didn’t see this one coming.

1 Like

Today, I’m thinking, I need a ZFS-like scrub.

You answered your own question. ZFS don’t trust anything, so the only scrub can tell you if something flipped over.

I plan to run a periodic (maybe monthly)

Depend on volume, for example scrubing 126Tb taking 2-3 days. scrub literally reread the whole array, so once in a month such exercises is Ok, but it really depend on requirement and importance of content.

To confirm the files actually match either in binary or preferably a hash.

That’s what ZFS scrub doing - comparing hashes saved in metadata with content, so no need to run your own hashing. If scrub would find corruption it will try to repair it and report it.

What is the point if the files I backup slowly bitrot and I don’t even realize it.

That’s a pretty good point, but it isn’t about backup, it is more about integrity of original content.
If you primary content (that one that you backuping ) on a file systems that care about integrity (such as ZFS), then than as often you running scrub, than more confident you are about avoiding bitrots.

If you mentioned bitrot in backup itself then this parasite isn’t a good friend with solutions that trying to make economy on drive’s space since there no duplicated content, but that the point to have multiple physically redistributed repositories that you don’t keep in constant sync, but with delay synchronization relatively to primary content. So in case of some backup got corrupted, you can restore files from delayed repositories. Another solution is posted in neighbor thread - is to use par2 utility that designed especially for such cases as to prevent bitrot, but it good for rarely changed, archive like files since any changes in file will require you to rebuild recovery par2’s files. Regardless, you obviously have to be sure that original content isn’t broken at first place. So solution like running scrub on ZFS or running some own tools that compare integrity of files by verifying hashes if it isn’t integrity aware filesystem, - is the only solution to be make sure your files are Ok.

Also consider SnapRAID (at least for parity and integrity checks), CloudBerry, GoodSync.

For me, I like seeing my data backed up, and knowing I can get to it. Just wish checksumming between my local storage and cloud backup was better and elegant.