PLEASE READ: Don't use --safety=none for routine maintenance

larryc · January 24, 2025, 2:26pm

Hi @kapitainsky,
I think you’re missing my point. Please re-read my posts above. I said nothing about providing functionality that manages free space dynamically. What I did say, and I’ll die on this hill, is that it is a major flaw that Kopia simply falls over if storage fills up. And when this happens, there is no emergency button to push to clean things up (eg, delete last N snapshots) and in fact it’s been noted that such a scenario has the potential to corrupt the entire backup repository. That is the part I am complaining about.

To use your UNIX analogy: I’ve been using Unix and Linux long enough that I remember when the filesystem would get corrupted if the host computer crashed or had a power failure. “That’s the user’s fault” the old-timers said. “They should use a UPS backup power supply”. But over time it was acknowledged that this was an unacceptable failure scenario and journaling filesystems such as ext3/4 and ZFS were developed. Similarly in the Windows world with the evolution of FAT to FAT32 to NTFS. And similar in the database world with the development of transaction logs and redo logs.

And to your point about restic et al, I see that restic has indeed solved my complaint;
Recover from no free space errors by MichaelEischer · Pull Request #3481 · restic/restic

And in the borg docs they call out that borg handles an out of diskspace condition gracefully:
If you run a backup that stops due to a disk running full, Borg will roll back, delete the new new segment file and thus freeing disk space automatically. There may be a checkpoint archive left that has been saved before the disk got full. You can keep it to speed up the next backup or delete it to get back more disk space.

97b4958b055b · March 20, 2025, 7:38am

Is this still an issue in v0.19.0 (Sat Dec 28 09:10:28 UTC 2024)?

97b4958b055b · March 22, 2025, 4:52am

Is this a confirmed issue or conjecture?

97b4958b055b · March 22, 2025, 4:56am

That’s not a flaw that’s an outright deal breaker. @jkowalski , did you want to confirm that is or is not the case?

If so this software is unfit for production in any form, full stop.

kapitainsky · March 22, 2025, 4:56am

reasonable guess as there is no mechanism in place to soft fail in such cases

97b4958b055b · March 22, 2025, 5:05am

if [ "$(df -h | grep "sda1" | awk '{print $3}' |  rev | cut -c 2- | rev)" -lt "1" ]; then
    echo "less than 1GB remains; aborting"
    exit 1
fi

Why can I do this in bash but they can’t do it in go?!

kapitainsky · March 22, 2025, 5:07am

It is rather exaggeration. Many IT systems do not handle out of space events gracefully.

In addition in kopia case it is often the case that destination storage is “unlimited”. I do watch my S3 repo usage but only to make sure that I won’t pay for it more than intended.

97b4958b055b · March 22, 2025, 5:09am

Obvious vectors of corruption are never an exaggeration.

Not every one uses other people’s computers (aka cloud computing).

97b4958b055b · March 22, 2025, 5:44am

These two statements are in direct contradiction. Even if some sort of a --before-folder-action sh is checking for free space (eg: 1GB) but the new, pushed snapshot exceeds it (eg: 2GB) the risk for a repo-wide failure cascade stands.

@jkowalski I really hope I’m misunderstanding the situation.

budy · March 22, 2025, 10:05am

My take on this is… there is a whole chain of things coming into play, when it comes to this specific problem. Even is Kopia incorporated some kind of protection against filling up the target storage… how would Kopia actually know, when to stop?

Also, keep in mind that there could be a maintenance be running at the same time your delivering a snapshot to the repo, makes this issue even worse to handle. I think the best precaution is to actually keep some space away from the repo, if its stored on a regular file system and if you happen to run it on a ZFS volume, make sure to have some reservation set apart, since once a ZFS becomes full, you won’t even be able to remove a placeholder file, due to its COW nature…

larryc · March 22, 2025, 1:43pm

Respectfully, you continue to misstate the problem/issue.

There are two things being discussed in this thread:

Can (or Should) Kopia preventively detect that an snapshot run would fill up diskspace or quota on a filesystem and respond accordingly? (eg, soft fail)
IMHO, this is useful functionality but not mission critical. While I’d like to see this incorporated into Kopia’s functionality, I have no problem if it is prioritized lower than other to-do’s. I do agree with the assertion that it’s your job as backup administrator to be on top of storage usage. Also note that as I quoted upthread, Borg handles this condition perfectly so it’s not an unheard-of feature in backup software.
Nevertheless, sh*t happens, and it is an expected scenario that your backup repo might run out of space. Which brings us to:
How can Kopia recover gracefully after a snapshot that failed due to an out of storage condition? This is the impetus for this whole thread: there’s currently no graceful way to recover when this happens. As someone put it succinctly in the github discussion: I would expect to be able to delete a snapshot then run maintenance to “clean things up”. However, when this happens, Kopia simply locks up and won’t even run maintenance. Borg & Restic (as I quoted upthread) both have means to rollback and recover. With Kopia, you are stuck deleting random files, running maintenance, (and repeat) and then crossing your fingers that it will be able to recover. And until maintenance completes successfully, your entire backup set is unreliable. As someone said upthread, that’s a dealbreaker. This is the part of the problem that I’ve been arguing about in this thread.
Yes, there is a workaround where one manually creates a “large” sacrificial spacer file within the destination filesystem which can be deleted in an emergency (and this is what I’ve done!), but this is an ugly workaround for something that should be a core part of the functionality.

I am very frustrated by the lack of attention to this core issue that was first acknowledged three years ago. It’s starting to make me question kopia’s fit-for-purpose.

budy · March 22, 2025, 2:13pm

Again, that’s way more complicated, than it looks at first sight. Even if Kopia would somehow stop writing to the storage once it detects that the next write would cause the storage to become full, you won’t have enough room to perform a maintenance run, since that also includes re-writing blobs and all that - which will need additional space of unknown size.

And lets not forget… the title states “for routine maintenance”. You may find yourself in a situation, where you’ll need to do this, but its not recommended for routine maintenance due to the probable implications when running a snapshot and performing maintenance at the same time, plus some other safeguards, which are enabled when running a standard maintenance.

And frankly, that’s what this thread is actually about. The issue with Kopia repos and exhausted storages is actually an issue on its own and there’s another thread already.

Yes, there is an issue wit

97b4958b055b · March 22, 2025, 9:55pm

To wit:

Just as Kopia has actions extensibility via scripts Monit can be augmented by sh.

97b4958b055b · March 22, 2025, 10:02pm

I’m that someone. I wholly stand by it and to metaphorically ‘cover your back.’ I also put it far more politely than due for a software product began, according to the timestamps for commit e0157cd for the LICENSE, some nine (9) years ago.

97b4958b055b · March 25, 2025, 10:08pm

So this is what your kind does when they can’t refute the points, huh? I make no apologies for calling out those who coddle advertent negligence.

Here’s something far more on topic than obvious, sophmoric attempts of muddying the waters & obfuscations: did it not occur to anyone ITT the -safety=none switch should have the requirement of being followed by --YESIKNOWWHATIAMDOING before accepting & executing?

It’s gobsmacking I have to point any of this out.

Now go ahead; false flag this post. Prove me right, again.

Topic		Replies	Views
Maintenance full taking an unbeleivable long time Support	6	650	June 8, 2023
Kopia maintenance run -> "deletion watermark time cannot move backwards" Support	2	398	October 23, 2021
Questions about Maintenance Support	5	654	August 1, 2021
Can Kopia maintenance interfere with kopia sync-to? Support	10	1072	November 16, 2021
Do I want to run dangerous commands? Cast your Vote here! Support	9	673	December 30, 2022

PLEASE READ: Don't use --safety=none for routine maintenance

Related topics