Best practices relative to reliability checks? Lately?

Linwood · April 5, 2025, 6:29pm

I’ve been reading and lightly testing Kopia a lot recently, thinking of stopping use of Cloudberry, and using this for both local NAS backups and B2 backup.

I think I am understanding many of the concepts but there are been a lot of postings (here and github) about verification, error notifications, do this, don’t do this… it seems that things have changed a lot over the last year or so in this regard.

Is there a current “best practices in being paranoid”? I see some notification tools, it sounds like some are built in (to the CLI at least), it sounds like you should run verifications but not the fix program (though that whole discussion baffles me, how can a fix-program not be broken if it is fixing things that don’t need to be fixed?).

I’ve done some straightforward tests, backing up a few hundred gig, restoring some files to confirm, but my concern is what happens over months of incremental’s and how to make sure it’s all going to work if I really need it.

Is there a “latest paranoid practices” or similar? So I don’t pick up old advice and think it still applies?

Also, are there advantages (in terms of verification thoroughness or speed) if you run the server version repository (e.g. on a local NAS)?

Linwood

97b4958b055b · April 6, 2025, 8:59am

Here’s the quick’n’dirty:

The GUI is not at feature parity compared to the CLI (eg: actions).
There’s absolutely no way I’d let this software touch a remote endpoint like a NAS without having disk/directory quota(s) already in place on it. You’re just setting yourself up for a potential failure cascade regardless how ‘good’ the previous verification runs reported.

I’m doubling up on ensuring the latter doesn’t happen. Good luck & fair warning.

Linwood · April 6, 2025, 3:13pm

I’ve already pretty much shifted to the CLI, for one thing it was the only way to put in my email relay (internal) without passwords. And the little cheat button that shows you the equivalent doesn’t work (e.g. policies and ignored files) so I’m having to actually look up the commands.

But I monitor my systems with Zabbix (soft of an agent based monit if you haven’t heard of it), so not worried about running out of space or other such. I’m worried about running routine operations for a few months only to find out some underlying corruption occurred silently and I can’t restore anything if I need it.

I’m also worried a bit about maturity as I use this more. I went through a bunch of hoops to get VSS based backups with admin privilege (or it would be better to have the backup privilege) going. It’s kind of a mess that this isn’t just part of the installation setup (whether CLI or UI). Maybe it’s unix roots, where app specific sudo is easier, but the resulting windows scheduler + powershell scripts just seems very fragile. And actions that are disabled by default? Why? Though to be fair when they broke I got clear indications via email.

This thing is super fast and feature rich. But in terms of trusting it… well, not there yet. Still experimenting.. I should note I’ve been in IT pretty much since it started, and still do consulting for larger organizations on DR planning and networks. So my expectations for “trust” are perhaps unreasonably high for a home use free product, and I mean no offense.

MFX · April 7, 2025, 2:46am

I like Kopia and do use it a lot. But after seeing a number of random errors which make no sense, I only use Kopia as a failsafe for a backup I do trust - simple mirrors.

There are many moving parts here. It might be unrealistic to ever completely rely on deduplicating backups.

97b4958b055b · April 7, 2025, 7:59am

Yeah, I’m aware of Zabbix. I don’t bother monitoring any more. I just write policies for Monit re: CPU, RAM, I/O, hosts, etc. checks & fail/restart states for services/daemon groups. Like Kopia’s actions are suppose to provide, Monit can also pass the results of conditions to/from external programs (eg: a shell script) if I really need to. I’ll check the logs while waiting for the coffee.

In the case of Kopia, I find I really do need to automate intervention… just in case.

Well, I’m a full Linux env so I can’t give you any insight re: Windows beyond that I loath Powershell (never again!) but more to your concern is that Kopia does seem to heavily rely on POSIX-based file systems. IDK how compliant NTFS is there.

Reading over the docs & scattered posts on this board, you can manually test consistency on demand. I’m also going to use its ECC before automating weekly tests.

Agreed. The cryptography & modern checksumming algos attracted me but it’s Kopia’s actions that’s really the only thing keeping me from walking away. Being able to pass pre and post conditions to jobs very much fits in with my ideal requirements.

To the former there’s also this:

That’s about 744 km (462 mi) ‘as the crow flies.’

Nice. I’ve been doing something similar since it was ATDT & not HTTP. I’d be incredibly hesitant to putting this into production without taking addn’l steps. As related ITT I’m also speaking of the fact endpoint credentials are stored in cleartext regardless of the OS. That may not be as much of a concern if FDE is in use but FDE doesn’t protect against malware siphoning/RATs.

I’m accounting for this ATM by ensuring the remote endpoint is isolated to its own container (chroot, technically), storage pool strictly for repo storage. Clients will have a private certificate authority installed to help control TLS access. Said endpoint is to be mirrored offsite.

Then there’s the default logging to debug annoyance to address:

I’d speculate the average SOHO user could get themselves into trouble quicker if it were.

But is it? Backing up isn’t exactly a frontier in IT, even SOHO, regardless the monetary aspect. What’s being asked to be put up/at stake can easily be consider priceless, situation dependent of course. Otherwise why bother backing up at all?

I do:

Linwood · April 7, 2025, 3:13pm

Much to think about in there, thank you.

In paranoia mode, for some years I’ve been using two separate tools, so that if one had silently gone insane the other might not. One is offsite, one local NAS (and a few external drives I am sadly not very reliable in remembering to update).

Both tools are commercial and both annoy me no end (one the UI, one the licensing and support thereof). But both have proven solid and never broken. But this way I have a lot less worry that some update will silently begin corrupting my backup (at least on both).

Every year or three I decide it’s time to punt on the one that annoyed me the latest, but so far (including now) I decide the grass was not greener and stay where I am.

The same thing happened here. I just could not get comfortable with a tool that require do many manual workarounds to work properly on windows. Kudos to the volunteers that build this, it’s an interesting tool – but doesn’t give me the warm fuzzy feeling I need.

Incidentally, yes, I use windows for my NAS as it’s also my security camera server and requires windows (Blue Iris). I hate the idea of a windows NAS, and even have a beefy system sitting in a closet on which I was going to run zfs or similar, but it just doesn’t make financial sense to keep two power hungry servers running all the time, so I do lots of checksum consistency checks.

Re zabbix: I’ve currently got it monitoring 125 IP devices, and 720 different Home Assistant entities. Beyond a certain point it’s too easy for something (like a leak detector) to just disappear and not be noticed (but not alarm because it’s gone). Zabbix is very handy for watching the things that watch my house.

Thanks again for the thoughts. I’m back to where I was. (Note not mentioning the products I use as don’t want to turn this into a comparison discussion in this forum).

97b4958b055b · April 7, 2025, 4:21pm

Well, to come back to the central concern ITT: if you’re worried about corruption, you’re implicitly asking what checksumming algos are in use, if any. You’re going to be hard pressed to find anything available, commercial or otherwise that uses blake3 (b3sum). Released in 2020, it even outperforms sha512sum accelerated CPUs as found in most Intel offerings.

That would help explain:

90TB is a lot to lose.

I’m sure you know ZFS has a stellar reputation including its own checksumming methods. I’d rather use that but BTRFS has lighter RAM requirements & supports blake2b (b2sum).

So, in effect, I’ll be checksumming my checksumming.

I’d let the file system handle that: ZFS or BTRFs (as mentioned). Monit can automatically run anything you want for on-demand verification & email you the results.

It sounds like you’ve got the bare metal needed for a beefy hypervisor machine. Why not stick Windows in a VM & use PCI-E pass-through to the NIC driving the IP cams? If even VirtualBox supports it, I’m sure you’ll have no trouble getting Proxmox doing the same.

Hell, you could even clone your existing Win setup to a VHDx, then convert it to VDI to import into VirtualBox before fully committing to Proxmox. See below.

As critical as I am about Kopia I’d rather deal with its warts than be strangled by closed source. I just pity anyone who thinks this can all be reliably set up with a few mouse clicks.

Linwood · April 7, 2025, 5:13pm

Let me pick out the middle – The security camera stuff is quite dependent on GPU. The NIC aspect is pretty moot, I’ve only got 8 cameras and they are on their own VLAN and actually on their own NIC into the server. I’m worried about getting adequate performance inside a VM both CPU (easy) and GPU (and frankly not knowledgeable about running GPU’s inside linux flavors of VM’s, I mostly do HyperV). There’s also CodeProject AI for image recognition in the mix.

Short version is it works as-is. It was on an beefy but old I7-6700K system that I couldn’t upgrade to W11. I debated switching it to linux but that didn’t solve the W11 issue as even inside a VM the processor still is an upgrade issue. So I just bought a new system entirely, and kept Windows native and two linux vm’s (HA and Zabbix) in HyperV. And I’ve tried a bunch of alternatives to BlueIris but never really liked any of the linux alternatives.

It’s also not clear that the BlueIris author would support his product in a VM on Linux (asking got no answer).

97b4958b055b · April 7, 2025, 5:38pm

Neat; CodeProject.AI runs on Debian. Proxmox VE is built on Debian.

You’ve got quite the stack running there. I defaulted to NIC instead of GPU purely out of habit: Proxmox is commonly used to virtualize OPNsense.

Well, to put things in perspective: the International Court of Justice (The Hague, NL) formally recognizes SHA512 for validating digital authenticity in trials inc. genocide. blake2b outperforms sha512sum. ZFS uses sha256sum. SHA256 & SHA512 are both of the SHA-2 ‘family.’ They are also FIPS 180-4 validated.

BTRFS (can) & ZFS automatically ‘heal(s)’ corrupted blocks when encountered.

So all things considered, the only real thing that would seem to be a concern would be getting Win 11 up. I know it needn’t be said but the camera server setup isn’t exactly forward facing so its not like it really needs the updates MSFT pumps out once it is as it should be. How fortunate this is barely a week old:

(There’s lots of vids on YT for HOWTO run Win 11 on Proxmox.)

Linwood · April 7, 2025, 7:35pm

Yeah, I thought of that, and there’s a lot of truth in it, though that system also runs my Plex server (which while linux compatible it seemed safer to share GPU’s on the same OS and same instance). But more to the point is my telescope computer and home desktop were both going to W11 and I was a bit worried I’d get something broken because of the difference (e.g. there have been times I’ve moved the VM’s to my desktop while working on the back end server). Laptop I think I’m shifting to linux, and will RDP to my desktop if I need something windows specific like Visio.

The path of least resistance with windows is just to keep it upgraded. Cursing, making obscene gestures toward Redmond, but … upgrade.

97b4958b055b · April 7, 2025, 9:36pm

I have to say I’m a bit envious of your position: it very much sounds like you have enough spare hardware that’s just begging to be set up as a full-on Linux ‘homelab’ before fully transitioning over… all without having to risk tearing down your existing stack!

I would be keeping good notes at least to the point you have ZFS up so you can checkpoint & rollback if something goes sideways. BTRFS might be better for the boot device/parition as to not have to tie up extra RAM for such a small ZFS pool. SuSE’s Snapper (with Btrfs Assistant for desktop envs) makes rollbacks stupidly easy.

Here’s more unsolicited advice: for desktop Linux, any distro, choose KDE v6+ for the environment while ‘cutting your teeth.’ It’s OOTB the closest to Windows for a GUI. I have mine tricked out to look damn close to Win 10. Call it old habits dying hard but I really do like the ‘flat’ look. A KDE desktop env should be expected to about 1.1 GB cold boot.

KRDC is worth checking out as a RDP/VNC client.

Granted it’s been, what, a decade plus I touched Visio, I’ve become quite partial to draw.io:

Note though this particular ‘sandboxed’ app, a flatpak, appearing to be ‘unverified’ it actually is. They just didn’t bother properly listing that tag on FlatHub:

KDE has a GUI to manage flatpaks if you don’t prefer the CLI (eg: flatpak install com.jgraph.drawio.desktop).

If you know the term ‘DLL hell’ I do believe you’ll like flatpaks. Flathub is the first place I look for desktop/GUI apps, installing bare metal only if they’re out of date or need very complete access to the host file system (eg: VSCodium, the F/OSS, telemetry-less version of VS Code).

Regardless of the Linux distro:

Debian is know for stability, especially on servers, but bare metal packages fr their software feeds (‘repos’) are known to be behind the latest versions. It is also a ‘point release’ (eg: Win 10 v Win 11).
Arch Linux is a ‘rolling release’ & famous for keeping their repos far more up to date faster than most. You’ll want to look up archinstall on their Wiki for quick set up if you go that route. BTRFS/ZFS for rolling back makes it far less ‘dangerous’ to run, counter to its reputation from being bleeding edge. Note the archinstall script defaults to CRC32 checksumming. Something like blake2b must be done manually using btrfs-progs.
OpenSUSE’s Tumbleweed is Enterprise class (source fr SUSE Linux Enterprise) & a rolling release. I haven’t used that one but it might be a good middle ground.

I think your cup runneth over for options. It would be a tidy bit of work in the end but I think you may be in the rare position of having every objective & eating it, too.

Linwood · April 7, 2025, 10:42pm

I set up a lot of linux systems for clients, almost always ubuntu server. I’ve run zfs as a NAS before at home; frankly don’t remember why I switched, it was a long time ago. Also have run rfs from microsoft, but it seemed immature and ill-managed at the time (several years ago). And now it’s not available in home windows.

So yes, I have hardware (though I’d swap out the bunch of 4TB Purples for a few 18tb probably), but interest and free time is not really there unless it’s useful (or usefully an improvement) for me.

97b4958b055b · April 7, 2025, 11:05pm

You’ll have no trouble adapting to Debian then. That’s what Canonical forks, then adulterates to then end up calling Ubuntu.

Well, that’s be ZFS then. It’s as battle tested as it gets. I wasn’t kidding in stating it automatically recovers when encountering corrupted blocks… & I was even ignorant to that it too supports blake3:

Linwood · April 7, 2025, 11:28pm

We’re beating a dead horse. I understand the principle. On the other hand I’ve got mirrored storage as well as checksums against the original source, and we’re only looking at 3TB or so of backups, so bit rot is not high on my worry list.

97b4958b055b · April 7, 2025, 11:48pm

What horse? To your credit you did say paranoid. Everyone should be of poorly designed file systems.

I like the fact you’re thinking beyond mere 3-2-1. Have a good one.

Topic		Replies	Views
Best method to ensure valid snapshots: snapshot verify vs snapshot fix invalid-files Support	16	2509	January 12, 2025
PLEASE READ: Don't use --safety=none for routine maintenance General	34	3002	March 25, 2025
Questions i cannot find an answer to General	20	2473	May 4, 2021
(very) Newbie questions General	17	4759	March 15, 2021
Requesting advice "best practices" for adding Kopia command line General Topics	12	257	October 8, 2024

Related topics