Bugs, Feature Reqs, v0.9.8; Migrate; Retention; Scheduler; Offline repo; et.al

Hi Jarek, Julio, and all devs, thanks for Kopia, super nice product, many features and coherent design. I did an extensive review of backup solutions, and selected and use Kopia for user data, and Borg for system. I especially like the builtin scheduling and web UI, plus robust deduplicated data model, and many backends.

A couple bugs and feature reqs:
(all these notes refer to a local storage repository)

BUG: MIGRATE dropped GLOBAL policy
Regarding the v1 to v2 format change, I was recently migrating a repo. I made a new repo, then migrated all my snapshots from the old. I did this via
$ kopia snapshot migrate --source-config …PathOld.config --all --policies
Everything seems to have migrated correctly except the GLOBAL policy did not get migrated. Afterwards I tried again with --policies --overwrite-policies --latest-only --sources=
but it still did not bring GLOBAL.
This is a serious bug, because if I had started the server on my new repo, it would have been deleting old snapshots based on the default retention.

BUG: Then I tried to fixup the policy via
kopia policy edit global
This doesn’t work, despite the help says it should. All the fields are empty and it shows "Editing policy for ‘%v’ "
It works to give
kopia policy edit --global
so I was able to edit the retention policy back.

FEATURE: I couldn’t find a way to SET a full dump of the policy settings via cmdline. You can GET via policy show --json, but how to set them in bulk from that json?

FEATURE: The gap between Month to Annual is long. What about a Quarterly period, possibly instead of Annual (see next)

BUG/FEATURE: Retention periods are counter-intuitive, lossy.
Retention period seems to be based on absolute dates? ie. week is cycling on Sunday? annual at end of year? And it’s preserving the latest in the time period, instead of the earliest. This results in being too cavalier about throwing away old backups.
Maybe would rather have earliest, or relative logic, such that the earliest/first snapshot defines the start of epoch?

There is general counter-intuitiveness in the retention policy timeframe as currently designed, which is most noticeable on the longer intervals. Most egregiously with Annual, which currently refers to the latest backup of a calendar year.
This seems completely wrong. If I specify Annual = 1 year, I want that to be protecting up to 1 year backwards. The latest ones are already protected by the smaller intervals daily/weekly/monthly.
To my mind, the goal is recovery and rollback coverage, not some “year-end accounting” notion. So I want the longest spans possible.
The periods should mean that I can rollback that far.

Example: Setting Annual = 2, I expect would cover 2 years span, say if today is 2022jan1 then back to 2020jan1 or the oldest one within this time span.
NOT AS with the current design, that Annual=2 setting gives you only 2 days coverage! (2022jan1 and 2021dec31)
In general, to protect the oldest backups with the current design, the Annual is useless, and I have to dial the weekly/monthly way up.

Found some other discussion on this:
A little baffled by retention policy
Trying to understand retention policies - #4 by jkowalski

FEATURE: Empty password.
I would like to have no password on some repo, but it currently requires a non-empty string.
It’s ok that the files/info would still actually be stored encrypted, per the whole data content is sliced and diced into blobs anyway, not actually files.
For local backups with physical control of the media, the password/encryption is less necessary, and there is risk of losing it, and it puts another step/requirement in the recovery process.

FEATURE: How about support for usually OFFLINE media? I would like to have some backups offline, then attach the media and a Kopia backup runs automatically (or by demand), then detach when it’s done. And have this automated to a high degree, so Kopia can sense when the repo is attached, without failing or too much system polling during the 99% time it’s offline. Also, a distinction between how often it checks for the media, say every 5 minutes, versus the actual backup period, say once per day, or once per week.
The idea being, I attach the media, Kopia notices it within some settable short period, but only does a new backup if it’s time for a new one, within a longer period. (i.e. not backuping every 5 minutes, just because the polling time was 5 min)
What would be the best way to accomplish this currently?

FEATURE: Scheduler to be more powerful, set actual times to run, not just elapsed seconds from prior/first. (either explicit times, or windows/intervals; also times to not run)

DOC The way CONFIG files work is not adequately documented. It took some research/experiments to figure them out. Buried under Reference.CommandLine, it’s listed as just another option, I initially thought was just a naming thing, to put the config-file somewhere else. But it’s really an overall operational attribute, the key to having multiple simultaneous connections and repos running.
It’s a good design, that you can have specific .configs, and/or the single system default config – just needs more intro/overview paragraphs, and documented at a higher level of the overall system operation.

FEATURE: Help should list single line commands first, otherwise they are buried in the commands which take many sub-commands. Or, you might consider grouping them by the most important commands first
Also what about -h or -? for help.

FEATURE It would be nice if cmds could accept a unique prefix or shorter form, for example “repo” for repository, “snap” for snapshot, etc.
It’s laborious to type the long cmds and options at cmdline.
This would be a more generally useful efficiency, and could serve in lieu of those default subcommands like server [start], and snapshot [create] which are somewhat desultory thus error prone.

OLD VERSION POSSIBLE BUGS
Here are a few more bug notes from my earlier tests around July 2021.
These are from old version, and/or I did not well understand the application yet, so these might not be actual bugs. I have not reexamined them lately.

BUG?: Restore via web UI, is making ALL SHALLOW, all .kopie-entry restores, doesnt actually restore whole tree. (v0.8.4, 2021jul16)

BUG?: Restore via cmdline, doesnt preserve dirtimes. Seems to be setting dirtime to time of newest contained file or dir, and this ripples upwards.

BUG? Scheduling per Path – when a Repo has multiple Paths in it, want to schedule backup them at different intervals. It’s doing both Paths, despite that the other Path has no Schedule set.

Thanks again for all your great work on Kopia.

– JBThiel

2 Likes

Thanks for the feedback, would you mind filing those as individual issues on GitHub so we can track them?

To quickly comment:

  1. Migrate bug - ageed, looks like a serious bug.
  2. kopia policy edit global changes the policy on the <current-dir>/global which is not intuitive. We should perhaps do something about it if global does not exist as a subdirectory.
  3. Full dump (export) of policies would be a good feature indeed. Possibly paired with import from JSON.
  4. Quarterly retention seems like a great feature to add.
  5. I’ve heard this request for “latest vs earliest” retention before. Let’s discuss concrete cases on GitHub, perhaps there’s an option to add there. BTW. Setting annual to N+1 should fix the issue (you’ll get 2020dec31,2021dec31,2022jan1) which is very close to 2021jan1,2022jan1.
  6. Detecting when repo is attached is an interesting case and is (possibly) similar to requirements for: Implement snapshot conditions ¡ Issue #1519 ¡ kopia/kopia ¡ GitHub
  7. Scheduling improvements are definitely needed. Please file a FR.
  8. A bunch of the config pages are auto-generated and thus quite crappy. Any help improving those is very appreciated.
  9. Improving help is also a great idea. Please file a FR or even better PR with a fix :slight_smile:
  10. Commands do accept aliases - you can do “kopia repo create”, “kopia snap create”, or even (even less obvious) "kopia snap "
  11. Restore via UI issue is indeed fixed.
  12. Directory times are not generally preserved with Kopia, which has certain benefits for deduplication. Are there scenarios where this is important? Please file a FR if so.
  13. I don’t understand scheduling per path issue. Please file a GH issue.

I would appreciate if you could link to individual issues on GH and let’s continue discussion over there.

2 Likes