Checking for snapshot divergence

Scenario: I have a filesystem-based Kopia repository with a snapshot of an external HDD on it. I’ve recently moved the external HDD partition and I want to use Kopia to ensure the move happened without introducing corruption.

I’ve reviewed Data corruption checking - Flag clarification, and I first attempted to use kopia snapshot create /app/backup --force-hash 100, but it seems to be uploading fresh data to the repository, not just rehashing and recognizing that the blobs already exist in the repo. I also tried kopia snapshot verify --verify-files-percent 100, but this seems to take forever (>10 hours) to list the blobs on a 2TiB repo and then gives an estimated time remaining of 1,500 hours.

I thought the forced hash on snapshot create was going to be exactly what I needed, allowing me to verify that the snapshot root was unchanged from the before-the-move snapshot and use snapshot diff to investigate any differences.

Any guidance is very much appreciated.

Pinging this topic again, as I’m still very much hoping for guidance on how to get confidence in the integrity of my data. :slight_smile:

Did the path to the moved data change? As long as the soure path stays unchanged, kopia shouldn’t upload any old data, which is already in the repo.

As for the verify command, on my 800GB repo this will take approx. 60 Mins. on a 1TB USB-C attached NVMe volume.

The source path didn’t change, but data was being uploaded anyhow. I’ll see if I can reproduce the issue, but it’s good to know that the expected behavior is that re-hashing will occur but new content will not be uploaded if it’s already present in the repository.

To confirm, you can run kopia snapshot create /app/backup --force-hash 100 and kopia will report that no new data is uploaded?

Also, why would it matter if the file paths changed? As long as the content was identical, shouldn’t all the content de-dupe? To check my understanding of kopia’s behavior, is the following statement true?

Kopia will not upload any new content (save for indexes and other minor metadata) if the contents of a source directory don’t change, even if the names and locations of files are randomly rearranged.

Are you sure, that the data was actually uploaded? That would indeed make no sense, since all the blobs should be already in the repository. Kopia may re-hash all data if e.g. their timestamps should differ, but it shouldn’t upload anything already in the repo. The quote is definitively right - kopia won’t do that. However, hashing and splitting are two major tasks which can consume a lot of processing power and thus take some time.