The Daily Mail - What are your ideas on backup reporting?

Hi all.

I’m wondering what your ideas are on backup reporting.

E.g. a daily status email like so would be nice (example on android notification screen):

MoSCoW-method loosely applied
M: daily mail stating if all is OK, if there are warnings or failures (also for rclone remote storage)
M: part of the kopia repository server
M: show a visualisation of all snapshots (kopia snapshot list --all) granted access to (using ACL’s)
M Set drift and only report outside drift time. E.g. daily snapshots should set a warning after 1 day, and a critical after 5 days.
M Show an overview of storage space in use (bonus: show the amount of storage before deduplication and compression just to make you feel good)
S browse and mount snapshots directly (so this could be integrated into the KopiaUI’s) under ‘reporting’ or ‘status overview’ in the context menu or something.
C (jkowalski) notification policy at all levels (global, per user, per host, per path) that specifies

  • what to send,
  • when to send it (periodically, on success, on failure) and
  • how (email, slack, pushover, local notifications, mqtt etc.)
    W centrally managed service by a company (could be useful, but should not be forced)

I have crashplan personal (the good 'ol), Duplicati Monitoring and dupReport as references (as part of kopia repo server).

Before you know it one has a plentitude of hosts and policies all snapshotting away at the speed of light, locally and remote.

What sort of things Must, Should, Could and Wont be in there in your preferences?

PS I’ll harvest and summarize input here into a longlist of preferences to inspire future work.

Since this is completely uncharted territory, thanks for starting this thread. You have a bunch of great ideas here.

I was thinking we can configure all this with policies. There could be a notification policy at all levels (global, per user, per host, per path) that specifies what to send, when to send it (periodically, on success, on failure) and how (email, slack, pushover, local notifications, etc.) and the server would be responsible for scheduling and sending these notifications.

There should probably also be a kopia report <path> that produces / sends a backup report for a given path if somebody does not want to use the server.

It is kind of hard to find one template for all use cases.
If you have a busy server with a million files on it where some of them appear/change frequently, one would probably accept that out of 1M possible backed up files, 999999 were done today, and the one that had an error (being written to while snap was done for instance) will be taken on next snapshot.
On the other hand, if you run snapshot on a directory with 5 super secret certificates that chance yearly at most, you would VERY much want a report if all 5 did not back up, and possibly which ones failed, but getting the list of failed files is completely useless on a box with 1M files in case some error prevents those 1M files from being read.

For log files that grow, it might be “good enough” if you get first 99% of the log correct and the last 1% that keeps changing fails after 1-3-5 retries because it is constantly moving, but once you accept this, you would not need a report that says “had to retry 5 times, still grew while copying, did my best”.
So while we can tell .kopiaignore that we don’t care for certain objects, we should need something to also state “if this works, fine, if not, let it pass” or the reports will have lots of superflous(sp?) info, and in my experience, if you get too much crap in nightly reports, they will not be read after 2 weeks.

There is a large span of expectations from people with different use cases. Is 99 hosts doing 99.99% ok backups “all green” or is it a case of “TOTAL FAILURE” because none managed to pass without some odd socket, link, portalfile or whatever caused a log line?

Everyone loves a “all hosts green”, but for machines seeing serious use, this will seldom occur, so the hard question is how to manage the information in the report to convey when you need to give some machines attention (tune kopiaignore for instance) and when not. And it is seldom as easy as one hopes to get this right (just like tuning spam filters to drop all spam and no important emails ever).

(sorry for long rant, been doing restores for a long time)

1 Like

I’d prefer to have metrics for everything available rather than a mail (as mentioned here: [Feature Request] Prometheus metrics from kopia server · Issue #609 · kopia/kopia · GitHub). You can then scrape the metrics whenever you want to and trigger alerts from there. So priority wise I’d prefer to have some improvements there (I haven’t checked the available metrics yet, I have to admit).
As a feature, a possibility to send a notification via mail / slack / whatever is nice. However, the triggers for that should be implemented - what and when can be scripted and does not have to be part of the server IMHO.