I’d like to let you know about some big and exciting features for v0.9 that have recently been merged. I could really use your help testing them ahead of the official release.
If you’re interested, read on.
New format is designed to support two big features:
content-level compression, which rewires snapshot pipeline to perform compression after hashing and allows room for future improvements, such as error-correcting codes (ECC)
epoch-based index management, which will support append-only operation (soon)
The code at HEAD has both features, but they are disabled by default and all new repositories will default to old format until the feature has been proven stable
I’m looking for folks who are willing to try the new format on their repositories, either by using in-place upgrade or creating brand-new repositories just for this purpose. I’m looking for coverage of all kinds of repositories, big and small, with different levels of concurrency, policies, storage backends, etc.
There is non-trivial amount of risk because the code is new and been tested only in very limited set of circumstances, yet it’s critical to get as much real-world coverage before declaring it stable for all users.
You will need to pass two new flags on repository creation. UI-only repository creation using new format is not supported yet, will be added soon.
$ kopia repository create ... --index-version=2 --enable-index-epochs
To upgrade existing repository to the latest format use the following command (WARNING: this is a one-way operation, there’s no way to revert to old format yet).
$ kopia repository set-params --upgrade
snapshots of a brand-new machine or directory that have files which exist elsewhere should be faster to create if compression is used, because deduplication will happen before compression.
the number of index blobs in the repository will increase (currently the number of index blobs is kept small during maintenance) and we will be retaining one index file per each epoch (typically max 4 epochs per day, usually much less), but we won’t be rewriting indices as aggressively as today
the total size of indexes may grow by 2x-3x, but they usually consume <0.1% of total repository size, so that’s not really significant, on the plus side if the repository gets corrupted, there will be higher chance of data recovery using redundant data.
the format should be stable and compatible with upcoming v0.9, but if major bugs are found we may need to nuke the repositories and start again.
- any data loss or corruption
- inconsistent view of repository across clients
- unexpected behavior changes, such as things being dramatically faster with no explanation or dramatically slower
- much higher or unexpectedly lower memory or CPU usage
- unexpected slowdowns when running CLI commands
Because epoch-based indexes are time-based, testing this will require some calendar time to pass.
I’d like to test this until the end of August, which should be enough time to go through between 100 and 200 epochs for most repositories with significant traffic (there will be max 4 epochs per day, depending on activity).
In the meantime we’ll be focusing on polishing the features, adding more CLI commands for debugging and troubleshooting, improving logging, etc.
At the end of the test period (early September) we’ll collectively decide if the features are stable enough to be enabled by default in v0.9 for new repositories or if we’ll need to keep them in the test phase for longer.
The optimistic plan is:
- v0.9 supports new repository format by default, old format can be still be enabled using flags
- v0.10 supports old and new format, will be prompting legacy users to migrate to the new format
- v0.11 supports new repository format only, old index format not supported
If you’re interested in helping with testing, please like this post using the heart icon below and please discuss any issues here in this thread.