Daily snapshot finishes, shows 0B, retries

TheRotag · August 4, 2023, 9:01pm

I have a policy that snapshots a directory daily at 4am. Every day, it runs on schedule, takes around 2 hours, then reports that it finished. However, at that point, the snapshot size shows “0 B”, where in reality it’s around 2.2 TB.

Looking at the list of snapshots for that policy, I see that it then retries every 45 mins for an additional 2-3 attempts, then stops retrying.

If I click on “Snapshot Now” in the UI, it runs for 2-4 seconds and completes successfully. Then the snapshot size shows 2.2 TB as I would expect, and the list of snapshots then shows just my two daily snapshots. The “incomplete” entries disappear.

Looking in the logs for the snapshot tasks (“Tasks” tab in UI, drill into relevant row), I don’t see any errors or indications of a problem. I also, curiously, don’t see any tasks for the “retries” (the ones that happen every 45 mins after the initial snapshot attempt, which show up as incomplete).

I have no idea what support information may be helpful, so below are various configs and images I think may be useful. Please let me know what else to share!

List of snapshots showing incomplete retries

Incomplete snapshot showing 0 B size

Policy definition

{
    "retention": {
        "keepHourly": 0,
        "keepDaily": 2,
        "keepWeekly": 0,
        "keepMonthly": 0,
        "keepAnnual": 0
    },
    "files": {
        "ignore": [
            ".DS_Store"
        ]
    },
    "errorHandling": {
        "ignoreFileErrors": true,
        "ignoreDirectoryErrors": false,
        "ignoreUnknownTypes": true
    },
    "scheduling": {
        "timeOfDay": [
            {
                "hour": 8,
                "min": 30
            }
        ]
    },
    "compression": {
        "compressorName": "zstd",
        "neverCompress": [
            ".zst"
        ],
        "minSize": 10240
    },
    "actions": {},
    "logging": {
        "directories": {},
        "entries": {
            "snapshotted": 5,
            "ignored": 5
        }
    },
    "upload": {},
    "noParent": true
}

Repository config

repository.config

{
  "storage": {
    "type": "b2",
    "config": {
      "bucket": "xxx",
      "keyID": "xxx",
      "key": "xxx"
    }
  },
  "caching": {
    "cacheDirectory": "../../.cache/kopia/d129c661fba5f559",
    "maxCacheSize": 5242880000,
    "maxMetadataCacheSize": 5242880000,
    "maxListCacheDuration": 30
  },
  "hostname": "kopia",
  "username": "xxx",
  "description": "Repository in B2: xxx",
  "enableActions": false,
  "formatBlobCacheDuration": 900000000000

Repository status

kopia repository status

Config file:         /xxx/.config/kopia/repository.config

Description:         Repository in B2: xxx
Hostname:            kopia
Username:            xxx
Read-only:           false
Format blob cache:   15m0s

Storage type:        b2
Storage capacity:    unbounded
Storage config:      {
                       "bucket": "xxx",
                       "keyID": "xxx",
                       "key": "*******************************"
                     }

Unique ID:           xxx
Hash:                BLAKE3-256
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-1M-BUZHASH
Format version:      3
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Epoch Manager:       enabled
Current Epoch: 30

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs

TheRotag · September 13, 2023, 5:19pm

Giving this post a quick bump, since it’s been hidden for 6 weeks so no one’s had a chance to see it.

This issue is still ongoing; I connect to my server once a day and click “Snapshot Now” on that job to get it to finish. No change in behavior from what I originally described.

elel · September 14, 2023, 2:58am

I think this is normal behavior. Take a look and see if this FAQ describes to your situation.

ChrisA · September 14, 2023, 5:27pm

Hi, it its a normal behaviour. The following snapshots will run fine and the incomplete ones will be deleted.

Cheers

TheRotag · September 14, 2023, 6:09pm

I think I’m missing something. Why does the backup keep retrying every 45 mins and still not marking it complete? And why is it that if I manually click “Snapshot Now”, it only takes a few seconds and then does mark it as complete?

ChrisA · September 14, 2023, 6:31pm

Have you taken a look at the faq?

Feel free to ask, if you got more questions.

Cheers

TheRotag · September 14, 2023, 9:49pm

Yes, I’ve read that FAQ, but something still seems off. It states that:

If a snapshot takes longer than the predefined checkpoint interval, Kopia creates a temporary incomplete snapshot, preventing the snapshot from being garbage-collected by the maintenance tasks. Kopia will remove incomplete snapshots once a complete snapshot of the files and directories has been created.

The problem is with that second part (“Kopia will remove incomplete snapshots once a complete snapshot […] has been created.”). What I think that means is, if you have a snapshot that runs for more than some amount of time (45 minutes I would guess), then it’ll create “incomplete” snapshots along the way, but once it finishes (assuming it finishes successfully), it will clean those up and you’ll just see the one completed snapshot.

What I’m actually seeing, though, (described above) is that every morning (it’s 100% repeatable), that policy’s snapshot doesn’t appear to have completed. It still has incomplete snapshots listed, and shows a size of 0 bytes. Then if I manually snapshot it, it takes just a few seconds to create a new snapshot and remove the incomplete ones. At that point, looking at that policy’s snapshots, I see two: one from two days ago and one from a few seconds ago (when I manually created a snapshot).

If the automatic (overnight) snapshot really had completed and just didn’t clean up the incomplete snapshots by design, then what I would expect is when I click “Snapshot Now”, it would take a whole new snapshot (which would take a couple of hours), and also presumably clean up the old incomplete snapshots from overnight. I wouldn’t expect it to just clean up that last snapshot and mark it as complete.

So the questions are:

Why is it consistently leaving behind those incomplete snapshots even after it completes?
Could it be that it isn’t actually completing? I don’t see any errors in the logs that I know how to check, but are there logs somewhere that would show errors if that’s what was happening?
Why does manually clicking “Snapshot Now” several hours later just finish that previous snapshot, but not take a new one?
If it is actually completing and just leaving behind those incomplete snapshots, why is its size listed as 0B?

ChrisA · September 15, 2023, 4:19am

Okay, can you post more details about your repository?

I assume that files are changing frequently and that the snapshot is maybe uploaded to a provider. Is that correct? By doing this, you may not complete the daily snapshot in under 45 minutes.

That might be the reason, that every day you are seeing incomplete snapshots.

For example, I had a “big” snapshot of over 250Gb. That took longer than 45 minutes in the first place and was seeing incomplete snapshots back then. After that initial snapshot was created, the newer ones take around 2 minutes.

Cheers,

TheRotag · September 15, 2023, 5:21pm

The repository is backing up to Backblaze B2. I don’t know exactly what details about it would be useful, but the config files are in the original post.

I have 8 policies in that repository; 7 of them complete every time without any issue. They range in size from 8 GB to 3.2 TB. Some have data that hardly ever changes, some are for frequently changing data like daily backups.

The policy that’s having this issue is for a directory where my home server saves its nightly backups of each container and VM. The server’s backup and retention logic deals with keeping the appropriate number of backups in that directory for each container/VM (e.g. 3 dailies, 4 weeklies, etc.), so I just have Kopia keeping 2 daily snapshots of that directory. Each one then contains whatever backups the server was retaining on that day.

The server’s backup logic runs daily at 4 AM and runs for about 20 mins, so I have the Kopia snapshot scheduled at 4:30 AM. The daily amount of new data is about 130 GB, and the total size of that directory is about 2.3 TB. The duration of the Kopia snapshot task varies quite a bit (not sure if that’s due to upload speed, differences in deduplication hits, or what), but seems to range from about 2 hours to 3 1/2 hours.

Given what I read in the FAQ that was shared above, I understand why incomplete snapshots are recorded every 45 mins during that job (if it took 2 hours, I’d expect one at the 45 min mark and another at 1 1/2 hours). I’m still concerned, though, that when it finishes (at the 2 hour mark), it leaves those incomplete snapshots behind.

ChrisA · September 16, 2023, 12:03am

Hi,

for me it seems like normal behaviour under these conditions. If the daily snapshot takes over 2 hours or more due to the amount of changed data paired with the upload speed etc. then it is not unusal seeing incomplete snapshots.

What about snapshotting more than one per day?

Cheers,

TheRotag · September 16, 2023, 6:04pm

That’s an interesting idea. One snapshot to actually back up the data, and a second as a workaround to fix the bug preventing the first snapshot from cleaning up when it finishes. That would probably work. I can give that a try.

I still don’t at all agree that this is “normal behavior”. Those “incomplete” snapshots should just be temporary checkpoints along the way during the process, and should be removed upon successful completion of the snapshot. To finish successfully but then “by design” leave behind incomplete snapshots and leave the size showing 0B sounds like an indisputable bug.

elel · September 18, 2023, 6:06am

There happened to be a code merge within the past day which relates to this issue. Very similar to your description with using a fixed time for backups. In addition, your situation has a further corner case of backups that take a very long time. You may want to chime in and say you have this corner case situation as well so the developers are aware. I’m sure they would be more than happy for an additional tester.

github.com/kopia/kopia

[Feature Request]: Automatically run overdue snapshots

opened 02:56AM - 07 Aug 23 UTC

DarkArc

enhancement help wanted keep-open

Currently, Kopia doesn't seem to do any "cathcup" snapshots if they're overdue. … # Observed Behavior In other words, if your backups run at night at 1am, and you have you computer off when the snapshot should've kicked off, Kopia will not run until 1am the next day. # Expected Behavior Kopia should (at least optionally) immediately start any overdue snapshots that missed their run window when the computer is booted. # Rational Nightly backups are often optimal in terms of avoiding impacting users. However, nightly backups in desktop and laptop form factors can easily result in a user missing a backup if their computer happens to not be on at the "right" time for an unexpectedly long period. This can in theory even affect more frequent uses if you get "unlucky" (as your device is regularly being put to sleep/shutdown) and thus isn't online "on the hour", when the various backup(s) are scheduled to run. Kopia having a nice GUI client positions itself as a great easy to use solution for these sort of "desktop"/"laptop" form factors. However, the lack of any kind of "catch up" makes this quite a dangerous option; particularly when deploying to users that aren't as tech-savy and might not think to check in on the status of their backups.

TheRotag · September 23, 2023, 4:57pm

Thank you, @elel. I added a comment about this over on GitHub per your suggestion.

Following up on this quickly, that is working for me. So thank you for that suggestion, @ChrisA! Greatly appreciated. Takes away the manual step every day.

TheRotag · September 23, 2023, 6:25pm

Created issue #3347 - Long-running snapshot leave behind incomplete snapshots and shows 0 byte size on GitHub to track this issue.

Topic		Replies	Views
Questions: I started one snapshot but 7 are displayed Support	2	340	May 13, 2023
6 incomplete snapshots after snapshot run Support	2	186	June 6, 2024
Trying to understand retention policies Support	12	4837	April 30, 2024
Snapshot list discrepancy General Topics	9	488	March 8, 2023
Snapshot Retention Explanation - for a 7 year old General Topics	2	221	July 22, 2024