Seeing kopia connect crash having large number of epochs

Hi,
I have been degugging a issue where kafka data was written to PVC and used to back it up.
Built a debug kopia image and below is what i could see where makeslice was crashing due to negative memory allocation

panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
github.com/kopia/kopia/internal/epoch.(*Manager).getCompleteIndexSetForCommittedState(0xc000122000, {0x17f2bc8, 0xc0004c7830}, {0x0, 0xc0
2ae0, {0xc000252a80, 0x6, 0x6}, 0xc00019e900, 0xc00019e8a0, ...}, ...)
        /home/runner/work/kopia/kopia/internal/epoch/epoch_manager.go:747 +0x3e7
github.com/kopia/kopia/internal/epoch.(*Manager).GetCompleteIndexSet(0xc000122000, {0x17f2bc8, 0xc0004c7830}, 0x40ce34)
        /home/runner/work/kopia/kopia/internal/epoch/epoch_manager.go:659 +0x1d5
github.com/kopia/kopia/repo/content.(*indexBlobManagerV1).listActiveIndexBlobs(0xc0000fc460, {0x17f2bc8, 0xc0004c7830})
        /home/runner/work/kopia/kopia/repo/content/index_blob_manager_v1.go:29 +0x69
github.com/kopia/kopia/repo/content.(*SharedManager).loadPackIndexesLocked(0xc00000a1e0, {0x17f2bc8, 0xc0004c7830})
        /home/runner/work/kopia/kopia/repo/content/committed_read_manager.go:191 +0x1d8
github.com/kopia/kopia/repo/content.NewSharedManager({0x17f2bc8, 0xc0004c7830}, {0x180cfc8, 0xc0000a6b80}, 0xc0003ec540, 0xc0001cd548, 0x
        /home/runner/work/kopia/kopia/repo/content/committed_read_manager.go:576 +0x588
github.com/kopia/kopia/repo.openWithConfig({0x17f2bc8, 0xc0004c7830}, {0x180cfc8, 0xc0000a6b80}, 0xc000079800, {0xc0003a4378, 0x8}, 0xc00
7f8, 0xc0000b50b8, {0xc0001a0640, ...})
        /home/runner/work/kopia/kopia/repo/open.go:225 +0x425
github.com/kopia/kopia/repo.openDirect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, 0xc000079800, {0xc0003a4378, 0x8}, 0xc0007cb7f8)
        /home/runner/work/kopia/kopia/repo/open.go:157 +0x207
github.com/kopia/kopia/repo.Open({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0xc0003d9480}, {0xc0003a4378, 0x8}, 0xc0000d4420)
        /home/runner/work/kopia/kopia/repo/open.go:90 +0x2b3
github.com/kopia/kopia/repo.verifyConnect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, {0xc0003a4378, 0x7})
        /home/runner/work/kopia/kopia/repo/connect.go:67 +0x45
github.com/kopia/kopia/repo.Connect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, {0x180cfc8, 0xc00018c370}, {0xc0003a4378, 0x8}, 0xc0
9110)
DEBUG Creating cache directory '/home/dimuralidharan/.cache/kopia/6ba39cc6e09eb37a' with max size 5242880000
DEBUG kopia.repository could not be fetched from cache: cached file too old
line 29 epoch.LatestEpoch: -1
line 643 committedState - e.lastKnownState: {WriteEpoch:0 UncompactedEpochSets:map[0:[{BlobID:xn0_036faf4deaae03ebdf0e222410ae2dfa-scf630028c5cf7b9711c-c1 L
-28 08:03:22 +0000 UTC} {BlobID:xn0_15b3aca4d3f394768c50acb281456eb3-sa5a0092e32a2943711c-c1 Length:143 Timestamp:2023-05-26 08:02:40 +0000 UTC} {BlobID:xn0
eba054-s8b6fd67a438485c911d-c1 Length:334 Timestamp:2023-06-01 08:04:57 +0000 UTC} {BlobID:xn0_2e856a11373fa1395dc62f52fd3b7b5a-s48c5e94f8a594d1f11c-c1 Leng
 08:03:53 +0000 UTC} {BlobID:xn0_50848a53a69933998965d87bd5d8bf4d-s5957a6f83d1fd4b511d-c1 Length:143 Timestamp:2023-05-30 08:04:08 +0000 UTC} {BlobID:xn0_83
8bc-sce8b1834aa5d0e4911d-c1 Length:143 Timestamp:2023-06-03 08:05:43 +0000 UTC} {BlobID:xn0_9464ee7326ab4a362a6208a9eb71d7ed-s29d7c4059632869111d-c1 Length:
:05:21 +0000 UTC} {BlobID:xn0_9f0395fddecf39e422b4dc7e2ccd51d9-s8907674730f45e7d11c-c1 Length:433 Timestamp:2023-05-25 08:02:29 +0000 UTC} {BlobID:xn0_a7bf2
-s857da5ba644ea83711d-c1 Length:143 Timestamp:2023-05-31 08:04:36 +0000 UTC} {BlobID:xn0_ab2643986b4c5bda049c66eaab2980da-s102b34ad63ae4cf611c-c1 Length:143
:25 +0000 UTC} {BlobID:xn0_e07bf4458ba6208a85525e03b3149ba8-s55d88622f75f9ca011c-c1 Length:143 Timestamp:2023-05-25 08:02:17 +0000 UTC} {BlobID:xn0_fd7f3f91
b1f34a70596d0cd11c-c1 Length:143 Timestamp:2023-05-27 08:03:04 +0000 UTC}] 1:[]] LongestRangeCheckpointSets:[0xc000547dd0 0xc000547bc0 0xc000547890 0xc00054
b30] SingleEpochCompactionSets:map[] EpochStartTime:map[] DeletionWatermark:0001-01-01 00:00:00 +0000 UTC ValidUntil:2023-07-31 11:47:07.39209877 -0700 PDT}

line 651 GetCompleteIndexSet - cs: {WriteEpoch:0 UncompactedEpochSets:map[0:[{BlobID:xn0_036faf4deaae03ebdf0e222410ae2dfa-scf630028c5cf7b9711c-c1 Length:143
:22 +0000 UTC} {BlobID:xn0_15b3aca4d3f394768c50acb281456eb3-sa5a0092e32a2943711c-c1 Length:143 Timestamp:2023-05-26 08:02:40 +0000 UTC} {BlobID:xn0_1e916a3f
b6fd67a438485c911d-c1 Length:334 Timestamp:2023-06-01 08:04:57 +0000 UTC} {BlobID:xn0_2e856a11373fa1395dc62f52fd3b7b5a-s48c5e94f8a594d1f11c-c1 Length:143 Ti
 +0000 UTC} {BlobID:xn0_50848a53a69933998965d87bd5d8bf4d-s5957a6f83d1fd4b511d-c1 Length:143 Timestamp:2023-05-30 08:04:08 +0000 UTC} {BlobID:xn0_83141457eb2
1834aa5d0e4911d-c1 Length:143 Timestamp:2023-06-03 08:05:43 +0000 UTC} {BlobID:xn0_9464ee7326ab4a362a6208a9eb71d7ed-s29d7c4059632869111d-c1 Length:143 Times
000 UTC} {BlobID:xn0_9f0395fddecf39e422b4dc7e2ccd51d9-s8907674730f45e7d11c-c1 Length:433 Timestamp:2023-05-25 08:02:29 +0000 UTC} {BlobID:xn0_a7bf28cd4644fd
a644ea83711d-c1 Length:143 Timestamp:2023-05-31 08:04:36 +0000 UTC} {BlobID:xn0_ab2643986b4c5bda049c66eaab2980da-s102b34ad63ae4cf611c-c1 Length:143 Timestam
 UTC} {BlobID:xn0_e07bf4458ba6208a85525e03b3149ba8-s55d88622f75f9ca011c-c1 Length:143 Timestamp:2023-05-25 08:02:17 +0000 UTC} {BlobID:xn0_fd7f3f911e5b44c80
96d0cd11c-c1 Length:143 Timestamp:2023-05-27 08:03:04 +0000 UTC}] 1:[]] LongestRangeCheckpointSets:[0xc000547dd0 0xc000547bc0 0xc000547890 0xc0005479b0 0xc0
leEpochCompactionSets:map[] EpochStartTime:map[] DeletionWatermark:0001-01-01 00:00:00 +0000 UTC ValidUntil:2023-07-31 11:47:07.39209877 -0700 PDT}
line 645 cs.WriteEpoch: 0
line 732 getCompleteIndexSetForCommittedState - minEpoch: 0, maxEpoch: 1
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:0 MaxEpoch:7 Blobs:[{BlobID:xr0_7_48b4f54ecba6c53d442c3aaf034ca680-s1eb4089941cae5
p:2023-06-04 00:03:35 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:8 MaxEpoch:15 Blobs:[{BlobID:xr8_15_74c84ae03c567246a7f5746dac29935c-s5832616961c7
amp:2023-06-13 18:04:20 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:16 MaxEpoch:23 Blobs:[{BlobID:xr16_23_ece5b240ee12fb5d256ccd27e4281e1c-sa2e9a2a246
stamp:2023-06-25 12:03:36 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:24 MaxEpoch:31 Blobs:[{BlobID:xr24_31_3b8b80bbb606942f9a47066eb8bb6d02-s4c48c8e1d3
stamp:2023-07-07 00:03:36 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:32 MaxEpoch:39 Blobs:[{BlobID:xr32_39_e0f1bbb97f664907205edb9d5d1b4b06-s97582263bf
stamp:2023-07-17 18:03:42 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:40 MaxEpoch:47 Blobs:[{BlobID:xr40_47_6a1613269fd796d04e198b5f6f351a92-s5c9
stamp:2023-07-26 10:03:42 +0000 UTC}]}
cnt: -46panic: runtime error: makeslice: len out of range

Here we can see that cnt = -46 with which seems to come up with large Epochs.
As i fully don’t understand the Epcoh (would be happy to learn about it), had few queries

  1. Is having large Epochs ok?
  2. When does this Epoch become large?
  3. I tried few normal backups, where i didn’t see Epochs increasing
  4. Is there any workaround to fix this? None of the kopia commands are working as connection itslef is failing.

[Note: In the setup where the failure was seen for some reason kopia maintenance was failing. Not sure if this adds up to the above issue]

1 Like

I see something similar here:

# kopia repository connect from-config --token eyJ2ZXJzaW9u[xxxxxxxxxxxxxxxx ]
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
github.com/kopia/kopia/internal/epoch.(*Manager).getCompleteIndexSetForCommittedState(0xc0000da600, {0x1cf8518, 0xc00014f3e0}, {0x0, 0xc0005a4660, {0xc00012c420, 0x4, 0x4}, 0xc0003c3680, 0xc0003c3620, ...}, ...)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/internal/epoch/epoch_manager.go:911 +0x3c5
github.com/kopia/kopia/internal/epoch.(*Manager).GetCompleteIndexSet(0xc0000da600, {0x1cf8518, 0xc00014f3e0}, 0x7?)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/internal/epoch/epoch_manager.go:750 +0x1d5
github.com/kopia/kopia/repo/content/indexblob.(*ManagerV1).ListActiveIndexBlobs(0xc000538b00, {0x1cf8518?, 0xc00014f3e0?})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/content/indexblob/index_blob_manager_v1.go:39 +0x54
github.com/kopia/kopia/repo/content.(*SharedManager).loadPackIndexesLocked(0xc00045cc00, {0x1cf8518, 0xc00014f3e0})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/content/committed_read_manager.go:222 +0x27f
github.com/kopia/kopia/repo/content.NewSharedManager({0x1cf8518?, 0xc00014f3e0}, {0x1d00a80?, 0xc0005352f0}, {0x1d01890?, 0xc000694480}, 0xc004725170, 0x0?, 0x0?)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/content/committed_read_manager.go:631 +0x645
github.com/kopia/kopia/repo.openWithConfig({0x1cf8518, 0xc00014f3e0}, {0x1d00d58, 0xc00023c080}, {{0xc000461ebc, 0x4}, {0xc000461ed0, 0x4}, 0x0, 0x0, ...}, ...)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/open.go:325 +0x825
github.com/kopia/kopia/repo.openDirect({0x1cf8518, 0xc00014f3e0}, {0xc000419e00, 0x25}, 0xc0004ec5b0, {0xc0003a3af0, 0xb}, 0xc00063f5a0)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/open.go:227 +0x334
github.com/kopia/kopia/repo.Open({0x1cf8518?, 0xc0000d3b60?}, {0xc000419e00, 0x25}, {0xc0003a3af0, 0xb}, 0x0)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/open.go:132 +0x465
github.com/kopia/kopia/repo.verifyConnect({0x1cf8518, 0xc0000d3b60}, {0xc000419e00, 0x25}, {0xc0003a3af0?, 0xc000166090?})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/connect.go:69 +0x56
github.com/kopia/kopia/repo.Connect({0x1cf8518, 0xc0000d3b60}, {0xc000419e00, 0x25}, {0x1d00d58, 0xc00035dfa0}, {0xc0003a3af0, 0xb}, 0xc00063f9f8?)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/repo/connect.go:64 +0x4df
github.com/kopia/kopia/cli.(*App).runConnectCommandWithStorageAndPassword(0xc000510500, {0x1cf8518, 0xc0000d3b60}, 0xc000512088, {0x1d00d58, 0xc00035dfa0}, {0xc0003a3af0, 0xb})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/command_repository_connect.go:123 +0x205
github.com/kopia/kopia/cli.(*App).runConnectCommandWithStorage(0x0?, {0x1cf8518, 0xc0000d3b60}, 0x0?, {0x1d00d58, 0xc00035dfa0})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/command_repository_connect.go:117 +0x9e
github.com/kopia/kopia/cli.(*commandRepositoryConnect).setup.func1.1({0x1cf8518, 0xc0000d3b60})
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/command_repository_connect.go:42 +0xa9
github.com/kopia/kopia/cli.(*App).runAppWithContext.func1({0x1cf8518, 0xc0000d3b00}, 0xc0000d3b00?, 0xc000510500, 0xc0000d3aa0)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/app.go:498 +0x191
github.com/kopia/kopia/cli.(*App).runAppWithContext(0xc000510500, 0x40c8ad?, 0x5?)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/app.go:499 +0x1c5
github.com/kopia/kopia/cli.(*commandRepositoryConnect).setup.func1(0xc00039e510)
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/cli/command_repository_connect.go:35 +0xe4
github.com/alecthomas/kingpin/v2.(*actionMixin).applyActions(...)
	/root/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.3.2/actions.go:28
github.com/alecthomas/kingpin/v2.(*Application).applyActions(0xc000438400?, 0xc00039e510)
	/root/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.3.2/app.go:560 +0xec
github.com/alecthomas/kingpin/v2.(*Application).execute(0xc000438400, 0xc00039e510?, {0xc000158a40, 0x3, 0x4})
	/root/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.3.2/app.go:398 +0x66
github.com/alecthomas/kingpin/v2.(*Application).Parse(0xc000438400, {0xc0000401f0?, 0xc000438400?, 0x165da35?})
	/root/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.3.2/app.go:230 +0x14f
main.main()
	/root/go/pkg/mod/github.com/kopia/kopia@v0.13.0/main.go:77 +0x1dc

I think of filing an Issue on github. This breaks my backups on that machine …

Is this issue the same?

I see this issue both on a gentoo box (with dev-lang/go-1.20.7) and a debian-12.1 box.

Yes it seems to be same issue. One hunch i have is if maintenace job is not running for long time, does this cause this?

Might be. Although I can’t run maintenance now as I am unable to connect to the repository in the first place.

In my case it’s a sftp repository.

Tried to reconnect with token or explicit command-line options. Failures.
Tried latest development build from github. fails.

Help appreciated. Maybe this should be filed under “Support” instead.
thanks

Approaching this one again. I see the same error from various machines, so it seems to be something around the repo itself (AFAI understand).

Tried to upgrade Go on the machine hosting the repository. No change.

So far I haven’t seen any feedback here or on github. Does anyone else hit this?

@pkumarn could you solve this somehow?

I assume there should be a way to access the repository even if no maintenance has been run for quite a while. And there are way bigger repos than mine, so I can’t imagine that the number of snapshots could break things here.

I was not able to fix the corrupted repo, only option was to let that go and take new backups :frowning:

@pkumarn thanks for the feedback. I wonder if the repo was corrupt or if there is some bug in the software. I wait a bit longer for any helpful feedback from the devs before I start from scratch …

Well it worked on the initial setup for me with the latest build - then it stopped connecting after rebooting the repository server once. And here you are having these issues on multiple clients … not sure if I’d have the patience to stick trying … that again.

And there are way bigger repos than mine, so I can’t imagine that the number of snapshots could break things here.

This puzzles me too … but you can easily throw any amount of movie collections towards something like Kopia - so that itself is no real flag as how reliable it is in reality.

For example my use case does introduce a Server solution like the Repository Server - they could all be using Rclone and S3 → that would offload a big chunk of stability on the Server side towards those solutions (until you have to restore e.g.).

By technical theory it all sounds like Kopia could be something great - the code and tests didn’t look to bad too … and likely early experience wasn’t to bad too … maybe there’s a lack of attention or others just don’t bother?

When I moved to test it (kopia 0.14.1) I immediately uncovered bad (=horrible) documentation, even on command-line, broken connect, non-streamlined commandline-arguments some weird design decisions and maybe an overly complex system that just isn’t stable enought - speaking from experience that implies a lack of real world testing and practical sanity in summary - before versions are released.

(e.g. removing http support - given that every single reverse proxy does usually build on it by layering - one very simple example that works great in practice is for example restics rest-server)

That is ALL written without doing even a SINGLE snapshot - having no data at a fresh repository.

All in all if feels unpractical - and given I read about corruption and suffered the same connect errors without doing any snapshot / backup at all - that’s something really really off-turning (and I’m usually really subborn in trying to get shit running).

What I’m speaking about are these issues - all mentioning that they can’t even connect until they fix it on the file-system layer:

Maybe Kopia just isn’t there yet … not going into depth how the filesystem-corruption can happen e.g. bitrot - just that it seems really hacky and off not being able to work with a system that’s built to fight such issues by design (one would think).

Yet I’m still here … maybe these are just random errors … who knows. I’ll still follow a bit to see the experiences from others, and maybe try again …

1 Like