Recovering from errors

Where may I find documentation on how to recover from errors or atleast show some information on the data affected?

I’m getting this error message when running maintenance:

2022-03-21T17:12:25.522694Z ContentInfo("4c50f57a81264768a65890663bc8c7f6") - error content not found
2022-03-21T17:12:25.522942Z error processing elasticsearch/nodes/0/indices/531RLnciSzeHZabvH3oW0Q/0/translog/translog.ckp: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
user@host:~$ kopia show 4c50f57a81264768a65890663bc8c7f6
ERROR error opening object 4c50f57a81264768a65890663bc8c7f6: content 4c50f57a81264768a65890663bc8c7f6 not found: object not found

Kopia version: 0.10.6
Backend: S3 (Wasabi)
OS: Fedora Linux 35
Kernel: 5.16.14-200

kopia content verify --full

You can explore the command diff too, but my guess, since it transaction log of elasticsearch it dynamically changing during backup that’s why. The best thing to do backup either by making filesystem snapshot (lvm, zfs, freebsd’s ufs…) to freeze stage before backup or stop elasticsearch (as well databases) before doing backup.

If I understand correctly this command tells me if a repo is corrupted, but I already know that something is wrong. Or does it tell me which snapshots are affected?

Data from elasticsearch isn’t important for me and can be recreated easily.

Anyone else having an idea how to recover from this error? Do I need to delete the repository and start from scratch?

The issue you experiencing is because you running backup over live working instance of elsaticsearch (ES) where transaction log changing while you backing up.
In the same way as you shouldn’t backup live MySQL or Postgres, you shouldn’t do it with ES.
Exclude completely in .kopiaignore live location of ES and do instead ElsticSearch snapshot and backup ES’s snapshots instead of live ES.

Snapshoting of ES is pretty easy, just a single curl request (after register repository once), like:

HOST=http://localhost:9200
repo=mybackup
snap_name=$(date '+%Y-%m-%dT%H-%M-%S')
repo_path='/path/to/backup'

# Register your repository (you need to do it just once):
  curl -s -XPUT "${HOST}/_snapshot/${repo}?pretty=true" \                                                                                                   
    -H 'Content-Type: application/json' \                                                                                                                         
    -d '{                                                                                                                                                         
      "type": "fs",                                                                                                                                               
      "settings": {                                                                                                                                               
        "location": "'"${repo_path}/${repo}"'",                                                                                                              
        "compress": true                                                                                                                                          
      }                                                                                                                                                           
    }'


# create snapshot of elasticsearch (kinda like a dump database)
curl -XPUT \                                                                                                                                                   
    "${HOST}/_snapshot/${repo}/${snap_name}?wait_for_completion=true&pretty=true"

# run kopia.... on ${repo_path}/${repo} only

You seem to misunderstand the question. Kopia maintenance runs are aborted with the above posted error message and I want to know how to fix/repair the repository.

user@host:~$ kopia maintenance info
Owner: user@host
Quick Cycle:
  scheduled: true
  interval: 6h0m0s
  next run: 2022-04-03 16:45:16 UTC (in 1h11m56s)
Full Cycle:
  scheduled: false
Log Retention:
  max count:       10000
  max age of logs: 720h0m0s
  max total size:  1 GiB
Recent Maintenance Runs:
  snapshot-gc:
    2022-04-02 16:45:13 UTC (13m37s) ERROR: unable to find in-use content ID: error processing snapshot root: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
    2022-03-26 16:45:11 UTC (7m11s) ERROR: unable to find in-use content ID: error processing snapshot root: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
    2022-03-24 14:22:22 UTC (3m18s) ERROR: unable to find in-use content ID: error processing snapshot root: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
    2022-03-24 14:13:06 UTC (6m23s) ERROR: unable to find in-use content ID: error processing snapshot root: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
    2022-03-24 13:28:11 UTC (9m4s) ERROR: unable to find in-use content ID: error processing snapshot root: error verifying 4c50f57a81264768a65890663bc8c7f6: error getting content info for 4c50f57a81264768a65890663bc8c7f6: content not found
  cleanup-epoch-manager:
    2022-03-16 12:23:20 UTC (3s) SUCCESS
    2022-03-15 10:46:56 UTC (2m12s) SUCCESS
    2022-01-23 17:22:37 UTC (0s) SUCCESS
  cleanup-logs:
    2022-03-16 12:23:20 UTC (0s) SUCCESS
    2022-03-15 10:46:51 UTC (4s) SUCCESS
    2022-01-23 17:22:37 UTC (0s) SUCCESS
  full-delete-blobs:
    2022-03-15 10:46:23 UTC (28s) SUCCESS
  full-drop-deleted-content:
    2022-03-16 12:23:19 UTC (0s) SUCCESS
    2022-03-15 10:46:22 UTC (0s) SUCCESS
  full-rewrite-contents:
    2022-03-16 10:06:38 UTC (2h16m40s) SUCCESS
    2022-01-23 17:22:37 UTC (0s) SUCCESS

I wonder if it would help to recover the indices using

kopia index recover

maybe even with --delete-indexes. Also, I do think, that omitting --commit, would not change anything, but I haven’t tried that. Neither have I recovering the indices, so use at your own risk. However, I am rather confident, that Kopia is very reluctant about destroying it’s internal data structures… and rightly so.

Since you’re running your repo on a S3 bucket, this will ultimately be a rather slow process, depending on the size of your repo. Maybe you want to try that locally first.

Or maybe, try to drop the content from the index…

kopia index optimize --drop-contents=DROP-CONTENTS

Since that content cannot be found anyway, you won’t loose anything - at least this is, what I think. :wink:

Try first

# 1.  But make sure there no any kopia's instance are working except this
kopia maintenance run --full --safety=none

# 2. If it failed
kopia blob delete 4c50f57a81264768a65890663bc8c7f6

BTW, did you tried: repository repair s3 … ?

I tried that but got the same error message.

This command did not produce any output, but I guess that’s expected because Kopia complained that this blob is missing, hence there is nothing to delete.

I ran that command without --commit and got this:
Found 6432604 contents to recover from 221632 blobs, but not committed. Re-run with --commit

I’m currently running with --commit and will report back later as there is still 9 hours remaining.

No, I didn’t try that because I couldn’t find any information when to use this command.

The recovery completed but the maintenance run is still throwing the same error at me. :cry:

@jkowalski: Any ideas?

PS: I really appreciate your efforts to help me with this problem.

Could you show output of

kopia snap list | grep errors

?

If this the only one snapshot, may be just delete it (in case the most recent snapshots are healthy)?

Unfortunately I have lots of snapshots with errors from multiple Windows clients because of read permission errors. But the 1 client with elasticsearch on it does not show any errors in its snapshots. :person_shrugging:

Running full maintenance...
Looking for active contents...
ERROR error processing 0F8E0C64646F76CA-win10_2022-04-04_Full-00-00.mrimg: error verifying Ixeec3b41bd642f3971d4d8bb231d4fc97: unable to read index: error getting content info for xeec3b41bd642f3971d4d8bb231d4fc97: content not found
Finished full maintenance.
ERROR snapshot GC failure: error running snapshot gc: unable to find in-use content ID: error processing snapshot root: error verifying Ixeec3b41bd642f3971d4d8bb231d4fc97: unable to read index: error getting content info for xeec3b41bd642f3971d4d8bb231d4fc97: content not found

I’m getting this (same) error message for a few days now on a different repository (repository from my original post has since been deleted) and a different server. Any ideas what is causing it and how to recover?