I hear you (a tough balance for sure) and thanks for the conversation, but I think my personal threshold for considering something “verified” accepts inherent risks involved in all data storage and retrieval and is no less true with other backup solutions.
You can take your argument to its conclusion and say that even one second after a specific file has been verified, it is no longer guaranteed to still be in that verified state. And I understand that Kopia alone can’t provide verification without reading back the data. This is why I want it to record when any one piece of data was last verified - so I can define my own risk tolerance. Right now, I can either (A) choose how many times I roll my 600,000-sided dice hoping that eventually, over multiple sessions, it’ll land on all of them, or (B) land on each of the 600,000 sides in one sitting (with no bathroom breaks!).
I’m asking for a completely different paradigm which would allow me to define the level of risk I’m willing to take on based on:
- the medium the repo is stored on
- how much and how often data changes on the repo
- of the data currently on the repo, how much of it is verified and how long ago that verification happened
My Kopia backups vary quite a bit. Some take snapshots every 20 minutes, others once a month. I would love the option to have Kopia verify (and auto-heal) data that hasn’t been verified in “x” days, which I would set according to each scenario.
If I’m backing up to a ZFS pool I control with data that changes infrequently, I might not feel the need to constantly verify remotely via Kopia since there’s a pretty high chance that if Kopia said the data was good a month ago, it’s still good.
Going back to Snapraid, on my 60TB array, which is mostly for archival and holds data which does not change often, my threshold is 180 days – any block that hasn’t been scrubbed in 180 days gets scrubbed again.
Also, your description of adding 500M to a previously-verified 30G repo requiring the remaining 29.5 to be re-verified is a different level of verification which is beyond what I’m suggesting and what is currently offered. Our concepts could possibly be combined in interesting ways (eg. verification being presented to the user at the snapshot level, meaning a --min-age=x
flag represent the days since each individual snapshot was last verified, but possibly even still tracked at the repo data level to minimize egress since there is so much overlap in what underlying repo data makes up each snapshot). This is complete theory for me though and really for @jkowalski to debunk.
Zooming out, I get it. This is an open-source, small, relatively young project, and improving this one (important) feature would only make meaningful improvements to what I’m guessing is a small subset of users. Kopia shouldn’t even be used in a professional environment anyways, and often couldn’t because of this very issue, amongst others – try to tell a client that you cannot provide proof of verification unless they want to pay insane egress fees. But I use it this way
Boiling it down,
- Is it possible to implement a method of recording a verification history of sorts, or otherwise improving the verify/repair toolset toward my goals?
- Is my understanding of Kopia’s current tools even correct?
- Why is it not recommended to run
snapshot fix
and instead do snapshot verify
? There’s something I must be missing cause if it finds an issue, why not just fix it?! I wouldn’t bother poking at this if the notification options were better. There is some more learning I could do on my end to set up better monitoring outside of Kopia, but this is the only piece of my data management that doesn’t ping me in one way or another if there are issues so I’m sensitive to this!