Efficiently determine changeset for incremental backups

Hi,

I am looking into improving incremental snapshot performance by utilizing filesystem capabilities which are beyond of what POSIX can do.
In general to find if some file changed within a directory since the last snapshot the entire hierarchy has to be scanned. That is because the modification time of a file is not propagated to its parent directories.

By their nature COW filesystems, like btrfs, allow to detect changes between two states of the filesystem quite efficiently. I’d like to utilize this feature but I did not find any direct way to influence Kopia this way.

What I looked into:

  • The ignore policies (which I consider a blacklist pattern). But I did not find the contrary (like include files). Even if so, I suspect that its meaning would not be what I am looking for, as only those files included would then be part of the snapshot. Instead I want to tell Kopia about all the data that has changed.
  • There is the actions feature, which allows to define before/after handlers. These could definitely come in handy, but still the scanning procedure cannot be influenced this way.

To me it looks like there is no such possibility at the moment. So I imagine to specify a custom scanner which tells Kopia about changed files since a certain snapshot. This of course needs to include anything, including files which were removed since the last snapshot. I understand that normally that is not something you would want users to do, because it can be quite risky to do it properly. So there certainly has to be a sane default. And maybe it cannot be done easily without adding some kind of plugin mechanism to the codebase.

Anyhow, I’d really like to look into this in order to save time for large backup sources. If you snapshot regularly, the true change will be minimal and if this information is available it’s really compelling to make use of it. Do you have any suggestions how to go about this?