Hi,
I have been degugging a issue where kafka data was written to PVC and used to back it up.
Built a debug kopia image and below is what i could see where makeslice was crashing due to negative memory allocation
panic: runtime error: makeslice: len out of range
goroutine 1 [running]:
github.com/kopia/kopia/internal/epoch.(*Manager).getCompleteIndexSetForCommittedState(0xc000122000, {0x17f2bc8, 0xc0004c7830}, {0x0, 0xc0
2ae0, {0xc000252a80, 0x6, 0x6}, 0xc00019e900, 0xc00019e8a0, ...}, ...)
/home/runner/work/kopia/kopia/internal/epoch/epoch_manager.go:747 +0x3e7
github.com/kopia/kopia/internal/epoch.(*Manager).GetCompleteIndexSet(0xc000122000, {0x17f2bc8, 0xc0004c7830}, 0x40ce34)
/home/runner/work/kopia/kopia/internal/epoch/epoch_manager.go:659 +0x1d5
github.com/kopia/kopia/repo/content.(*indexBlobManagerV1).listActiveIndexBlobs(0xc0000fc460, {0x17f2bc8, 0xc0004c7830})
/home/runner/work/kopia/kopia/repo/content/index_blob_manager_v1.go:29 +0x69
github.com/kopia/kopia/repo/content.(*SharedManager).loadPackIndexesLocked(0xc00000a1e0, {0x17f2bc8, 0xc0004c7830})
/home/runner/work/kopia/kopia/repo/content/committed_read_manager.go:191 +0x1d8
github.com/kopia/kopia/repo/content.NewSharedManager({0x17f2bc8, 0xc0004c7830}, {0x180cfc8, 0xc0000a6b80}, 0xc0003ec540, 0xc0001cd548, 0x
/home/runner/work/kopia/kopia/repo/content/committed_read_manager.go:576 +0x588
github.com/kopia/kopia/repo.openWithConfig({0x17f2bc8, 0xc0004c7830}, {0x180cfc8, 0xc0000a6b80}, 0xc000079800, {0xc0003a4378, 0x8}, 0xc00
7f8, 0xc0000b50b8, {0xc0001a0640, ...})
/home/runner/work/kopia/kopia/repo/open.go:225 +0x425
github.com/kopia/kopia/repo.openDirect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, 0xc000079800, {0xc0003a4378, 0x8}, 0xc0007cb7f8)
/home/runner/work/kopia/kopia/repo/open.go:157 +0x207
github.com/kopia/kopia/repo.Open({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0xc0003d9480}, {0xc0003a4378, 0x8}, 0xc0000d4420)
/home/runner/work/kopia/kopia/repo/open.go:90 +0x2b3
github.com/kopia/kopia/repo.verifyConnect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, {0xc0003a4378, 0x7})
/home/runner/work/kopia/kopia/repo/connect.go:67 +0x45
github.com/kopia/kopia/repo.Connect({0x17f2bc8, 0xc0004c7830}, {0xc0001a0640, 0x34}, {0x180cfc8, 0xc00018c370}, {0xc0003a4378, 0x8}, 0xc0
9110)
DEBUG Creating cache directory '/home/dimuralidharan/.cache/kopia/6ba39cc6e09eb37a' with max size 5242880000
DEBUG kopia.repository could not be fetched from cache: cached file too old
line 29 epoch.LatestEpoch: -1
line 643 committedState - e.lastKnownState: {WriteEpoch:0 UncompactedEpochSets:map[0:[{BlobID:xn0_036faf4deaae03ebdf0e222410ae2dfa-scf630028c5cf7b9711c-c1 L
-28 08:03:22 +0000 UTC} {BlobID:xn0_15b3aca4d3f394768c50acb281456eb3-sa5a0092e32a2943711c-c1 Length:143 Timestamp:2023-05-26 08:02:40 +0000 UTC} {BlobID:xn0
eba054-s8b6fd67a438485c911d-c1 Length:334 Timestamp:2023-06-01 08:04:57 +0000 UTC} {BlobID:xn0_2e856a11373fa1395dc62f52fd3b7b5a-s48c5e94f8a594d1f11c-c1 Leng
08:03:53 +0000 UTC} {BlobID:xn0_50848a53a69933998965d87bd5d8bf4d-s5957a6f83d1fd4b511d-c1 Length:143 Timestamp:2023-05-30 08:04:08 +0000 UTC} {BlobID:xn0_83
8bc-sce8b1834aa5d0e4911d-c1 Length:143 Timestamp:2023-06-03 08:05:43 +0000 UTC} {BlobID:xn0_9464ee7326ab4a362a6208a9eb71d7ed-s29d7c4059632869111d-c1 Length:
:05:21 +0000 UTC} {BlobID:xn0_9f0395fddecf39e422b4dc7e2ccd51d9-s8907674730f45e7d11c-c1 Length:433 Timestamp:2023-05-25 08:02:29 +0000 UTC} {BlobID:xn0_a7bf2
-s857da5ba644ea83711d-c1 Length:143 Timestamp:2023-05-31 08:04:36 +0000 UTC} {BlobID:xn0_ab2643986b4c5bda049c66eaab2980da-s102b34ad63ae4cf611c-c1 Length:143
:25 +0000 UTC} {BlobID:xn0_e07bf4458ba6208a85525e03b3149ba8-s55d88622f75f9ca011c-c1 Length:143 Timestamp:2023-05-25 08:02:17 +0000 UTC} {BlobID:xn0_fd7f3f91
b1f34a70596d0cd11c-c1 Length:143 Timestamp:2023-05-27 08:03:04 +0000 UTC}] 1:[]] LongestRangeCheckpointSets:[0xc000547dd0 0xc000547bc0 0xc000547890 0xc00054
b30] SingleEpochCompactionSets:map[] EpochStartTime:map[] DeletionWatermark:0001-01-01 00:00:00 +0000 UTC ValidUntil:2023-07-31 11:47:07.39209877 -0700 PDT}
line 651 GetCompleteIndexSet - cs: {WriteEpoch:0 UncompactedEpochSets:map[0:[{BlobID:xn0_036faf4deaae03ebdf0e222410ae2dfa-scf630028c5cf7b9711c-c1 Length:143
:22 +0000 UTC} {BlobID:xn0_15b3aca4d3f394768c50acb281456eb3-sa5a0092e32a2943711c-c1 Length:143 Timestamp:2023-05-26 08:02:40 +0000 UTC} {BlobID:xn0_1e916a3f
b6fd67a438485c911d-c1 Length:334 Timestamp:2023-06-01 08:04:57 +0000 UTC} {BlobID:xn0_2e856a11373fa1395dc62f52fd3b7b5a-s48c5e94f8a594d1f11c-c1 Length:143 Ti
+0000 UTC} {BlobID:xn0_50848a53a69933998965d87bd5d8bf4d-s5957a6f83d1fd4b511d-c1 Length:143 Timestamp:2023-05-30 08:04:08 +0000 UTC} {BlobID:xn0_83141457eb2
1834aa5d0e4911d-c1 Length:143 Timestamp:2023-06-03 08:05:43 +0000 UTC} {BlobID:xn0_9464ee7326ab4a362a6208a9eb71d7ed-s29d7c4059632869111d-c1 Length:143 Times
000 UTC} {BlobID:xn0_9f0395fddecf39e422b4dc7e2ccd51d9-s8907674730f45e7d11c-c1 Length:433 Timestamp:2023-05-25 08:02:29 +0000 UTC} {BlobID:xn0_a7bf28cd4644fd
a644ea83711d-c1 Length:143 Timestamp:2023-05-31 08:04:36 +0000 UTC} {BlobID:xn0_ab2643986b4c5bda049c66eaab2980da-s102b34ad63ae4cf611c-c1 Length:143 Timestam
UTC} {BlobID:xn0_e07bf4458ba6208a85525e03b3149ba8-s55d88622f75f9ca011c-c1 Length:143 Timestamp:2023-05-25 08:02:17 +0000 UTC} {BlobID:xn0_fd7f3f911e5b44c80
96d0cd11c-c1 Length:143 Timestamp:2023-05-27 08:03:04 +0000 UTC}] 1:[]] LongestRangeCheckpointSets:[0xc000547dd0 0xc000547bc0 0xc000547890 0xc0005479b0 0xc0
leEpochCompactionSets:map[] EpochStartTime:map[] DeletionWatermark:0001-01-01 00:00:00 +0000 UTC ValidUntil:2023-07-31 11:47:07.39209877 -0700 PDT}
line 645 cs.WriteEpoch: 0
line 732 getCompleteIndexSetForCommittedState - minEpoch: 0, maxEpoch: 1
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:0 MaxEpoch:7 Blobs:[{BlobID:xr0_7_48b4f54ecba6c53d442c3aaf034ca680-s1eb4089941cae5
p:2023-06-04 00:03:35 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:8 MaxEpoch:15 Blobs:[{BlobID:xr8_15_74c84ae03c567246a7f5746dac29935c-s5832616961c7
amp:2023-06-13 18:04:20 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:16 MaxEpoch:23 Blobs:[{BlobID:xr16_23_ece5b240ee12fb5d256ccd27e4281e1c-sa2e9a2a246
stamp:2023-06-25 12:03:36 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:24 MaxEpoch:31 Blobs:[{BlobID:xr24_31_3b8b80bbb606942f9a47066eb8bb6d02-s4c48c8e1d3
stamp:2023-07-07 00:03:36 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:32 MaxEpoch:39 Blobs:[{BlobID:xr32_39_e0f1bbb97f664907205edb9d5d1b4b06-s97582263bf
stamp:2023-07-17 18:03:42 +0000 UTC}]}
line 736 getCompleteIndexSetForCommittedState - RangeMetadata: &{MinEpoch:40 MaxEpoch:47 Blobs:[{BlobID:xr40_47_6a1613269fd796d04e198b5f6f351a92-s5c9
stamp:2023-07-26 10:03:42 +0000 UTC}]}
cnt: -46panic: runtime error: makeslice: len out of range
Here we can see that cnt = -46 with which seems to come up with large Epochs.
As i fully don’t understand the Epcoh (would be happy to learn about it), had few queries
- Is having large Epochs ok?
- When does this Epoch become large?
- I tried few normal backups, where i didn’t see Epochs increasing
- Is there any workaround to fix this? None of the kopia commands are working as connection itslef is failing.
[Note: In the setup where the failure was seen for some reason kopia maintenance was failing. Not sure if this adds up to the above issue]