BLOB suddenly not found

Hi,

I ran into quite a problem while trying setting up a root action.

While in testing, my sftp backup destination lives on an external drive connected to my server via USB. The connection has not proven to be 100% reliable, so I wrote a script to run before the backup commences. Once the script was working perfectly, and my own Pushover integration was doing what it is supposed to, I ran

kopia --config-file repository-home.config policy set /home/b3ta --before-snapshot-root-action /home/b3ta/.config/kopia/repository-home.config.before-snapshot-root-action

and once more tested failure modes, with the script exiting with 0 on success and either 1 or 2 on failure. Suddenly the UI would no longer respond to clicking on the snapshot button, and the UI no longer connects to the repository, simply showing <initializing>.

Running this from the cli

kopia --config-file repository-home.config repository status

returns

ERROR failed to open repository: unable to create format manager: unable to read format blob: error getting kopia.repository blob: BLOB not found ERROR open repository: unable to open repository: unable to create format manager: unable to read format blob: error getting kopia.repository blob: BLOB not found

So at the moment nothing seems to be working.

An error I have run into sometimes is that things go wrong when file names are longer than the code assumes. Might that be the case here, as my script’s full path is 75 characters long.

If that is the case, how do I go about fixing it?, as I can’t use the policy command from the cli and the UI won’t load the repository.

How can a script even help in providing a stable USB connection? If any part of the underlying hardware is questionable, I’d never use Kopia - or any other backup tool for that matter, with it. Every backup/archiving solution starts with a flawless foundation and that includes the connections to the storage.

I hope this post will be taken for what it is: some background so that I can hopefully get help to solve the actual problem. I am about to roll Kopia out at work, but there is zero chance I can do that unless I understand why what happened did, as well as how to mitigate it, as none of the other posts on BLOB issues seem to be related to mine, and my test installation is currently unresponsive.

TL;DR: I answer budy’s question and statement, but this is my actual question: Can someone give some guidance on what happened? At the moment I cannot use my test backup at all. I had a tentative look at the code, but I haven’t written code in anything other than bash since the 1990s, so it was a frustration more than anything else.

===

Firstly, the technology in use is neither here not there — consider Wikipedia’s take on USB: “Universal Serial Bus … allows data exchange … between many types of electronics.” Now compare that to IP: “The Internet Protocol (IP) is … for relaying datagrams across network boundaries.”

Abstracting away how they accomplish it, each of these protocols is concerned with the transmission of data between end-points. A massive conceptual difference between the two is that IP includes routing, whereas USB does not.

So, to your question:

Conceptually it’s easy, though it’s a pain to code well.

The reality is that many people will use external USB drive enclosures to ease rotating drives to take backups off-site. Since Kopia encrypts the data at rest and is as fast as it is, etc., etc., this is a very, very attractive thing to do.

One price to pay is how the OS handles a drive being removed instead of unmounted. Most people don’t use these kinds of USB connectors, so it is possible (no matter how careful you are) to disconnect it by mistake.

If it is reconnected before the OS has figured out it can reuse the mount point (if it is even possible without manual or cron-scripted intervention), auto-mount will put it somewhere else. So now instead of being at, say, /mnt/USB-ID-1, it is ends up at /mnt/USB-ID-2.

Once that happens, a Kopia backup without checks and balances will probably cause a disaster. My script provides some of these checks and balances.

To see how such a disaster can happen, remember that a mount point needs to exist in order to mount a drive there. So, if the sftp mount point is user@192.168.1.2:/mnt/USB-ID-1/ but the above scenario happened, then Kopia will happily send data to that directory, which is probably on a way-too-small drive instead of the actual backup drive, which is now mounted elsewhere.

Here is how the script helps.

Before the backup starts, make sure you run a script (using --before-snapshot-root-action) to check that the drive is mounted where it is expected to be, which is really simple to do. If not, deal with it, which is not at all simple, given all the possibilities.

For example, if you have multiple rotating drives as per the above scenario, then how are you going to ensure that auto-mount knows that this otherwise random drive is part of that backup group and should be mounted there? I solved it by inter alia defining different volume groups and then giving each drive in a group the same VG name. There are other things to consider as well, such as automatically reporting on pending drive failure as reported by SMART, but I hope the concept is clearer now.

Reality has taught me otherwise. Two examples among very many:

  1. Is your connection to your cloud storage flawless? Then why use error detection in the network layer?
  2. Is the actual storage flawless? Then why worry about bit rot? The implementation of Reed-Solomon error correction in Kopia will put it in a class of its own, and is something to which I am really looking forward.

Perusing queries here and on other places such as r/Backup, r/selfhosted, and r/homelab show how many issues people experience when they assume everything always works in the world of backups, whether OSS or not.Thus my own experience and that of others force my hand to check for error conditions, especially show-stoppers.

While I don’t - and won’t for that matter, enter a general discussion on different storage types posing different challenges, especially USB connections are the least reliable ones. Sure, the network connection to your cloud service can and amlost will experience issues, but that’s on the transport channel and Kopia will of course notice those and perform the appropriate actions. I’d rather expect the cloud providers to have their hardware in good working order. I am running a USB-3 based repo on my server at home as well and never had any issue with it, but I am only using it locally and “exporting” via a Kopia server and have my other Kopia clients connect to it.

In your case, however, Kopia complains about a missing BLOB and that is what you should look at. Have you tried to actually check, if the repo shows up on that mount point? Also, have you tried to mount the repo locally on the server, that would tell you, whether it’s the repo itselfm which is corrupted, or something else is. Try to run Kopia in Debug mode and see, if you can get more information out of it.

Thanks for the reply, budy.

I spent most of the day on this (it’s my day off), and have a lot of detail below. The TL;DR is:

  • Initially, nothing I did with the original (home) repo definition worked.
  • Running in debug log mode showed that it said it could not connect to the repo, but connecting by hand via ssh using the data in the repository-home.config worked fine to the same sftp server.
  • After rebooting the sftp server, Kopia still would not connect.
  • I’ve been using Kopia for my wife’s laptop, and that’s working fine.
  • I created a new repo definition assuming a local mount, and that worked when I ran blob ls.
  • Logging in on the sftp server showed new files in the relative root of the backup.
  • After this, running blob ls using the previous repo file (now renamed) also worked, which seems very strange indeed.
  • Renaming the repo file back to what it was and then trying the blob ls once more continued to work.
  • BUT: no snapshots and no policies seem to be anywhere, so I have no idea how to get to any of the files in the backups.
  • Running repository status using the two repo files returns interesting results.

Details follow, with a question at the end.

===

My local cache has the kind of size I’ve been seeing:

b3ta@beethoven:~/.cache/kopia$ du -shc *
9.7M        095d9c592927c82e
364K        78d2aee49bb87ab5
5.7G        c719e62dd2b2891f
63M        cli-logs
68M        content-logs
5.8G        total

[In what follows, kc is a simple bash script which saves me typing kopia --config-file repository-home.config each time.]

Then I tried once more to connect, this time with debug level logging:

b3ta@beethoven:~/.cache/kopia$ kc --file-log-level=debug blob ls
ERROR failed to open repository: unable to create format manager: unable to read format blob: error getting kopia.repository blob: BLOB not found
ERROR open repository: unable to open repository: unable to create format manager: unable to read format blob: error getting kopia.repository blob: BLOB not found
b3ta@beethoven:~/.cache/kopia$ cat cli-logs/latest.log 
2024-11-11T11:03:08.027070Z ERROR kopia/cli open repository: unable to open repository: unable to create format manager: unable to read format blob: error getting kopia.repository blob: BLOB not found
b3ta@beethoven:~/.cache/kopia$ 

The only extra information is this: open repository: unable to open repository. That is very strange, as I did not change anything. To double-check, I ssh’d in by hand, copying and pasting the relevant details from the .config file. Logging in was successful, and this is what I saw:

b3ta@backup-home:~/bk/beethoven$ du -shc .[A-z]* *
4.0K	.shards
282G	B
282G	total

I don’t know if there’s supposed to be something else in there, but the size is about right.

Back on my laptop, I got a list of all the files in the tree under ~/.cache/kopia, sorted by time of last modification, and the following files were all modified at the time the problem occurred:

2024-11-09 07:44:06.711562955 +0200 ./c719e62dd2b2891f/metadata/05/4a59b83718175bb7cdf5d380cdddf0-s608b155b67bfd03f12eq.f
2024-11-09 07:44:06.694562668 +0200 ./c719e62dd2b2891f/indexes/xn6_01cfae1527108c37855f93f3dd3dea33-s608b155b67bfd03f12e-c1.sndx
2024-11-09 07:44:06.647561875 +0200 ./c719e62dd2b2891f/blob-list/xs.f
2024-11-09 07:44:06.635561672 +0200 ./c719e62dd2b2891f/blob-list/xr.f
2024-11-09 07:44:06.635561672 +0200 ./c719e62dd2b2891f/blob-list/xe.f
2024-11-09 07:44:06.619561402 +0200 ./c719e62dd2b2891f/own-writes/addxn6_01cfae1527108c37855f93f3dd3dea33-s608b155b67bfd03f12e-c1.f

The file command identifies the *.f files under ./c719e62dd2b2891f/blob-list as being JSON data, but after the end of the final close brace character in each file there are a number of what appears to be non-sensical characters. I can provide hex dumps, if that will help.

The sftp server is running in a Proxmox LXC, where it has been doing so successfully for months. I rebooted the entire Proxmox server since the above, to no avail. Also, my wife’s laptop is successfully doing its backups there, and running kopia blob ls for her backup works as expected.

Being lazy, I used Kopi-UI to create the config file, then ran kopia --config-file repository-1731327309355.config blob ls, which returned 13,001 lines of output, so it seems things are actually fine with the repository.

I then logged in on the sftp server and saw that there were new files and directories in the backup’s relative root:

root@backup-home:/mnt/backup/home/b3ta/bk/beethoven# ls -la 
total 44
drwxr-xr-x   6 b3ta b3ta  4096 Nov 11 14:19 .
drwxr-xr-x   4 b3ta b3ta  4096 Nov  9 08:01 ..
-rw-r--r--   1 b3ta b3ta    43 Nov  9 07:56 .shards
drwxr-xr-x 524 b3ta b3ta 12288 Nov  9 07:44 B
-rw-------   1 b3ta b3ta    30 Nov 11 14:19 kopia.blobcfg.f
-rw-------   1 b3ta b3ta  1101 Nov 11 14:19 kopia.repository.f
drwx------   3 b3ta b3ta  4096 Nov 11 14:19 q2e
drwx------   3 b3ta b3ta  4096 Nov 11 14:19 s52
drwx------   3 b3ta b3ta  4096 Nov 11 14:19 xn0
root@backup-home:/mnt/backup/home/b3ta/bk/beethoven# 

After this, running blob ls using the previous repo file (now renamed) also worked, which seems very strange indeed. Renaming the repo file back to what it was and then trying the blob ls once more continued to work…

… BUT after restarting the GUI, even though it now loads the previous repo definition, my entire snapshot history is gone from the interface, as are the policies I defined. Running snapshot list comes up empty, as does policy list (just the default global is returned).

As this is a test installation it’s not the end of the world, but if it were live it would be a disaster. The backups seem to be on the sftp server, but how to access them?

As a final test, I ran repository status using the two files, with interesting results:

b3ta@beethoven:~/.config/kopia$ kopia --config-file repository-1731327309355.config repository status
Config file:         /home/b3ta/.config/kopia/repository-1731327309355.config

Description:         My Repository
Hostname:            beethoven
Username:            b3ta
Read-only:           false
Format blob cache:   15m0s

Storage type:        filesystem
Storage capacity:    1 TB
Storage available:   393.8 GB
Storage config:      {
                       "path": "/home/b3ta/mnt/kopia-home",
                       "dirShards": null
                     }

Unique ID:           8fca6fd74be5649bfce5cbc62764d765e74dd5aa460f62de978202ecd4124014
Hash:                BLAKE2B-256-128
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-4M-BUZHASH
Format version:      2
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Epoch Manager:       enabled
Current Epoch: 0

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs
b3ta@beethoven:~/.config/kopia$ kopia --config-file repository-home.config.INACTIVE repository status
Enter password to open repository: 

WARN invalid list cache HMAC for xr, ignoring
WARN invalid list cache HMAC for xe, ignoring
WARN invalid list cache HMAC for xs, ignoring
Config file:         /home/b3ta/.config/kopia/repository-home.config.INACTIVE

Description:         home
Hostname:            beethoven
Username:            b3ta
Read-only:           false
Format blob cache:   15m0s

Storage type:        sftp
Storage capacity:    1 TB
Storage available:   403.8 GB
Storage config:      {
                       "path": "/mnt/backup/home/b3ta/bk/beethoven",
                       "host": "172.16.10.3",
                       "port": 21212,
                       "username": "b3ta",
                       "password": "*****************************",
                       "knownHostsFile": "/home/b3ta/.ssh/known_hosts",
                       "externalSSH": false,
                       "dirShards": null
                     }

Unique ID:           8fca6fd74be5649bfce5cbc62764d765e74dd5aa460f62de978202ecd4124014
Hash:                BLAKE2B-256-128
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-4M-BUZHASH
Format version:      2
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Epoch Manager:       enabled
Current Epoch: 0

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs
b3ta@beethoven:~/.config/kopia$ 

===

That leaves the question of why this happened.

As stated in the OP, I am suspicious about the length of the --before-snapshot-root-action full file name, as things went wrong when I added the action and that name is very long. Not being able to run any commands against this repository and not knowing where or how the script name is stored, I am unable to remove it from the Kopia data structures and try with a much shorter name.

Furthermore, the fact that the three identified JSON formatted files had what looks like random garbage appended to them makes me wonder about this even more.

Well… shouldn’t be the before action configured here:

/home/b3ta/.config/kopia/repository-home.config.before-snapshot-root-action

Also, Kopia does encrypt almost everything, so the json file you noticed could be encrypted. If you have successfully connected to that repo locally on your server, which I took from your post you had, than the repo should be fine. You could always run a kopia snapshot verify to check that.

You should also be able to get rid of the before actions, since they are included in the policies. When push comes to shove, you could just delete the policy with the before-action and recreate it.

Finally, go ahead and make sure to clean your local Kopia cache. There have been instances, where BLOBs which have been deleted from the repo showed up in the cache and Kopia tripping over it. Usually Kopia removes its cache on quit, but there no harm in checking.