Recognizing duplicate inodes?

Kopia generally avoids rehashing files if the attributes (name, modification time and size) match the last snapshot - it will simply assume that the file has not changed (this behavior can be overridden).

The logic today relies on walking current directory (local filesystem) and previous snapshot (remote filesystem) in parallel and compares file attributes side by side. Actually - multiple previous snapshots are supported and they are all consulted in parallel and if any of them matches, Kopia avoids hashing (this helps with performance of previous incomplete snapshots).

In principle we could support additional “previous” snapshot references, either specified on the command line or via some policy/rule. For example I think the pattern where you have a parent directory with time-based subdirectories could be recognized speed this up:

Consider:

/some/dir/2021-04-15
/some/dir/2021-04-16
/some/dir/2021-04-17
/some/dir/2021-04-18
/some/dir/2021-04-19
/some/dir/2021-04-20
/some/dir/2021-04-21

When snapshotting a directory - say /some/dir/2021-04-19 - Kopia could detect the pattern and use the lexicographically-previous entry or entries (which would be /some/dir/2021-04-18, /some/dir/2021-04-17, /some/dir/2021-04-16, …) assuming they are similar at all. This should produce big performance improvement.

Sounds like a good experiment to try. The user experience is the hardest and very important to go get right. l I’ll be happy to discuss this further and if somebody is interested in implementing that, help with codebase orientation and code reviews.