Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/metamorphic/crossversion/crossversion_test: TestMetaCrossVersion failed #2325

Closed
cockroach-teamcity opened this issue Feb 14, 2023 · 50 comments

Comments

@cockroach-teamcity
Copy link
Member

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ 5d8daed0ac88:

                --- PASS: TestMeta/execution/random-025 (1.63s)
                --- FAIL: TestMeta/execution/standard-021 (1.76s)
                --- PASS: TestMeta/execution/random-020 (1.74s)
                --- FAIL: TestMeta/execution/random-005 (0.84s)
                --- PASS: TestMeta/execution/random-019 (2.09s)
                --- FAIL: TestMeta/execution/random-013 (1.09s)
                --- PASS: TestMeta/execution/random-018 (2.42s)
                --- PASS: TestMeta/execution/random-017 (2.60s)
                --- PASS: TestMeta/execution/random-016 (2.33s)
                --- PASS: TestMeta/execution/random-022 (4.21s)
                --- PASS: TestMeta/execution/random-014 (2.83s)
                --- FAIL: TestMeta/execution/random-009 (1.38s)
                --- FAIL: TestMeta/execution/random-008 (1.20s)
                --- PASS: TestMeta/execution/random-012 (2.51s)
                --- FAIL: TestMeta/execution/random-003 (0.80s)
                --- PASS: TestMeta/execution/random-010 (2.88s)
                --- FAIL: TestMeta/execution/standard-024 (0.10s)
                --- PASS: TestMeta/execution/random-000 (1.56s)
                --- FAIL: TestMeta/execution/random-001 (0.62s)
                --- PASS: TestMeta/execution/random-011 (3.42s)
                --- FAIL: TestMeta/execution/standard-025 (0.57s)
                --- FAIL: TestMeta/execution/standard-019 (0.43s)
                --- FAIL: TestMeta/execution/standard-011 (0.72s)
                --- FAIL: TestMeta/execution/standard-020 (0.59s)
                --- FAIL: TestMeta/execution/standard-016 (0.32s)
                --- FAIL: TestMeta/execution/standard-018 (0.50s)
                --- FAIL: TestMeta/execution/standard-017 (0.51s)
                --- FAIL: TestMeta/execution/standard-014 (0.29s)
                --- PASS: TestMeta/execution/random-007 (3.72s)
                --- PASS: TestMeta/execution/random-004 (2.83s)
                --- FAIL: TestMeta/execution/standard-012 (0.30s)
                --- FAIL: TestMeta/execution/standard-013 (0.42s)
                --- FAIL: TestMeta/execution/standard-015 (0.79s)
                --- PASS: TestMeta/execution/random-002 (2.75s)
                --- FAIL: TestMeta/execution/standard-010 (0.31s)
                --- FAIL: TestMeta/execution/standard-006 (0.48s)
                --- FAIL: TestMeta/execution/standard-007 (0.48s)
                --- FAIL: TestMeta/execution/standard-008 (0.64s)
                --- FAIL: TestMeta/execution/standard-009 (0.93s)
                --- FAIL: TestMeta/execution/standard-004 (0.64s)
                --- FAIL: TestMeta/execution/standard-003 (0.89s)
                --- FAIL: TestMeta/execution/standard-005 (0.85s)
                --- PASS: TestMeta/execution/random-006 (4.45s)
                --- FAIL: TestMeta/execution/standard-001 (0.73s)
                --- FAIL: TestMeta/execution/standard-002 (0.88s)
                --- FAIL: TestMeta/execution/standard-023 (0.90s)
                --- FAIL: TestMeta/execution/standard-022 (1.08s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=16534736066321149689-opts=random-003(sha=a30d64b3-seed=10483783647614281427-opts=random-001(sha=4ddacdaf-seed=6474672136251743794-opts=random-016(<empty>))) (194.41s)
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

@jbowens
Copy link
Collaborator

jbowens commented Feb 14, 2023

        === CONT  TestMeta/execution/standard-000
            meta_test.go:391: 
                ===== SEED =====
                12303769912327526879
                ===== ERR =====
                exit status 1
                ===== OUT =====
                file 001170 (type 2) unknown to the provider           

Potentially fallout from the new objstorage interface?

@bananabrick
Copy link
Contributor

bananabrick commented Feb 14, 2023

Pretty weird:

We try and delete a file 001170 made obsolete by a rewrite compaction as part of Job 5 and it succeeds:

// INFO: [JOB 5] compacting(rewrite) L6 [001170] (781 B) + L6 [] (0 B)
[05:19:17] :	 [Step 1/1]                 // INFO: [JOB 6] compacting(default) L0 [001554 001589 001588 001586 001590 001639] (5.7 K) + L4 [] (0 B)
[05:19:17] :	 [Step 1/1]                 // INFO: [JOB 4] sstable deleted 001549
[05:19:17] :	 [Step 1/1]                 // INFO: [JOB 5] compacting: sstable created 001649
[05:19:17] :	 [Step 1/1]                 // INFO: [JOB 5] compacted(rewrite) L6 [001170] (781 B) + L6 [] (0 B) -> L6 [001649] (953 B), in 0.0s (0.0s total), output rate 2.3 M/s
[05:19:17] :	 [Step 1/1]                 // INFO: [JOB 5] sstable deleted 001170

But then later, as part of Job 1, we try and delete the file again as part of Job 1. Looks like the delete part of Job 1 was delayed? But the details on Job1 are missing from the logs.

// INFO: [JOB 1] sstable delete error 001170: file 001170 (type 2) unknown to the provider

It looks like we're trying to work with an obsolete sstable.

@bananabrick
Copy link
Contributor

bananabrick commented Feb 14, 2023

Looks like a jobID of 1 is only used during Open. So, we try to delete the file during open, but then later schedule a rewrite compaction for the same file?

@bananabrick
Copy link
Contributor

In Open, we ratchet the format major versions(which can schedule rewrite compactions). And then we delete obsolete files at the end of the Open function(after the ratchet).

I think I understand what's going on here. Not a major bug.

@jbowens
Copy link
Collaborator

jbowens commented Feb 14, 2023

Possibly related, we backported 163fdf6 to 22.2. If it ends up being an issue introduced by the reordering within Open in 163fdf6, we'll need to backport something to 22.2

@bananabrick
Copy link
Contributor

No that change is fine.

I believe the bug is an interaction between 1. 7a29c5b, and 2. https://github.com/cockroachdb/pebble/blame/master/open.go#L254, and 3. https://github.com/cockroachdb/pebble/blame/master/open.go#L472.

1 is tracking the list of files in the directory in memory. It does not allow duplicate file deletions, see: https://github.com/cockroachdb/pebble/blob/master/objstorage/provider.go#L288. If a file is deleted, then it's removed from that map, and cannot be deleted again.

In 2, we list the files at the beginning of Open. Then, in 3, we use that same list to scan the directory for obsolete files. This is invalid. In between 2 and 3, we could have flushes occur due to WAL replay, and rewrite compactions occur due to format major version ratcheting. By using the list from 2 for 3, we accidentally mark files already deleted as obsolete. This has always been the behaviour, but previously we wouldn't error out when trying to delete already deleted files. See: https://github.com/cockroachdb/pebble/blob/master/compaction.go#L3395.

I think the fix is just to relist the files before scanning for obsolete. What do you think?

@jbowens
Copy link
Collaborator

jbowens commented Feb 14, 2023

Good find, that makes sense.

I think the fix is just to relist the files before scanning for obsolete. What do you think?

I'm a little wary of racing with concurrent compactions even if we relist the files. The ratcheting of the format major version can schedule compactions. It's possible that compactions may be running in the background after the ratcheting completes. Those concurrent compactions may also delete obsolete files. Before 163fdf6, we wouldn't schedule new compactions until the very end of Open: https://github.com/cockroachdb/pebble/blame/master/open.go#L485-L486

Should we delete Listed obsolete files before ratcheting the format major version?

@bananabrick
Copy link
Contributor

bananabrick commented Feb 14, 2023

Should we delete Listed obsolete files before ratcheting the format major version?

Yea, I think that makes sense. But we'd also have to delay writing the Options file it looks like.

On the other hand, we have some contradictory behaviour here: https://github.com/cockroachdb/pebble/blob/master/objstorage/provider.go#L239 and here https://github.com/cockroachdb/pebble/blob/master/objstorage/provider.go#L216. The caller of Provider.Remove checks for IsNotExistError, and returns if that's the case. But we'll never return a not exists error, because instead we'll return errors.Newf("file %s (type %d) unknown to the provider", errors.Safe(fileNum), errors.Safe(fileType)).

Should we keep the old behaviour, and allow deletion of files which have already been deleted. What do you think? cc @RaduBerinde

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ 5d8daed0ac88:

                --- FAIL: TestMeta/execution/standard-021 (1.43s)
                --- FAIL: TestMeta/execution/standard-014 (0.53s)
                --- FAIL: TestMeta/execution/standard-016 (1.00s)
                --- PASS: TestMeta/execution/random-004 (3.82s)
                --- FAIL: TestMeta/execution/standard-007 (0.51s)
                --- FAIL: TestMeta/execution/standard-004 (0.12s)
                --- FAIL: TestMeta/execution/standard-008 (0.20s)
                --- FAIL: TestMeta/execution/standard-010 (0.26s)
                --- FAIL: TestMeta/execution/standard-005 (0.10s)
                --- FAIL: TestMeta/execution/standard-002 (0.07s)
                --- FAIL: TestMeta/execution/standard-006 (0.20s)
                --- FAIL: TestMeta/execution/standard-009 (0.38s)
                --- FAIL: TestMeta/execution/standard-011 (0.56s)
                --- FAIL: TestMeta/execution/standard-012 (0.78s)
                --- FAIL: TestMeta/execution/standard-001 (0.28s)
                --- FAIL: TestMeta/execution/standard-003 (0.35s)
                --- FAIL: TestMeta/execution/standard-013 (0.32s)
                --- PASS: TestMeta/execution/random-008 (6.99s)
                --- PASS: TestMeta/execution/random-007 (1.97s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-012(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>))) (131.29s)
=== RUN   TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>)))
=== PAUSE TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>)))
=== CONT  TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>)))
=== CONT  TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>)))
    crossversion_test.go:233:   Running test with version 5d8daed0 with initial state sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>))).
        --- PASS: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=15777962831401829224-opts=random-015(sha=a30d64b3-seed=10819384999717991909-opts=random-009(sha=4ddacdaf-seed=2794607553930378993-opts=standard-015(<empty>))) (131.29s)
panic: runtime error: index out of range [253] with length 250 [recovered]
	panic: runtime error: index out of range [253] with length 250

goroutine 34 [running]:
testing.tRunner.func1.2({0xc2c200, 0xc000032078})
	GOROOT/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	GOROOT/src/testing/testing.go:1399 +0x39f
panic({0xc2c200, 0xc000032078})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/pebble/internal/metamorphic.reorderHistory({0xc0002b3500, 0xfa, 0x0?})
	github.com/cockroachdb/pebble/internal/metamorphic/external/com_github_cockroachdb_pebble/internal/metamorphic/history.go:158 +0x10a
github.com/cockroachdb/pebble/internal/metamorphic.CompareHistories(0x7ffdb17ee82d?, {0xc001fb0000, 0x1d4, 0x8?})
	github.com/cockroachdb/pebble/internal/metamorphic/external/com_github_cockroachdb_pebble/internal/metamorphic/history.go:133 +0xec
internal/metamorphic/crossversion/crossversion_test.runCrossVersion({0xd74cb0, 0xc000132000}, 0xc0002a0d00, {0xc0002f23c0, 0x56}, {0xc0002fe000, 0x4, 0xdbf775?}, 0xf?, 0xa)
	internal/metamorphic/crossversion/crossversion_test/external/com_github_cockroachdb_pebble/internal/metamorphic/crossversion/crossversion_test.go:175 +0x425
internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion(0xc0002a0d00)
	internal/metamorphic/crossversion/crossversion_test/external/com_github_cockroachdb_pebble/internal/metamorphic/crossversion/crossversion_test.go:109 +0x1e5
testing.tRunner(0xc0002a0d00, 0xcbf260)
	GOROOT/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	GOROOT/src/testing/testing.go:1493 +0x35f
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@RaduBerinde
Copy link
Member

Should we keep the old behaviour, and allow deletion of files which have already been deleted. What do you think? cc @RaduBerinde

Yes, I think we should fix that.

Also, I have a TODO in scanObsoleteFiles to retrieve the list of objects from the provider (and ignore any SSTs from the file list). I think we should do that instead of relisting. @bananabrick let me know if you want to take a crack at that, otherwise just reassign the issue to me.

@bananabrick
Copy link
Contributor

bananabrick commented Feb 16, 2023

@RaduBerinde We'd have to relist anyway, I think. The write ahead logs(which are associated with memtables) could also become obsolete(due to flushes during WAL replay). And are you saying we should allow duplicate file deletions(like we currently do), or we shouldn't?

And I can finish this, since either way the fix will be small.

@RaduBerinde
Copy link
Member

Only tables go through objstorage, WALs would be unaffected by my changes. I do agree that objstorage should return not-exist. But in any case, we also need to get the list of tables from the objstorage provider (which by itself would also fix this).

RaduBerinde added a commit to RaduBerinde/pebble that referenced this issue Feb 16, 2023
Return a NotExistError when the object is not in our list. This is
important for Remove, where such an error is ignored.

Informs cockroachdb#2325
@RaduBerinde
Copy link
Member

I opened #2335 to return an error for which IsNotExistError is true. Not sure how to confirm that it fixes this failure, I couldn't repro with stressmeta.

By the way, the failure from yesterday looks different: #2325 (comment)

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ 9b1142a5070e:

                --- FAIL: TestMeta/execution/standard-022 (1.10s)
                --- FAIL: TestMeta/execution/standard-019 (0.67s)
                --- FAIL: TestMeta/execution/standard-016 (0.72s)
                --- FAIL: TestMeta/execution/standard-021 (1.32s)
                --- FAIL: TestMeta/execution/standard-017 (0.91s)
                --- FAIL: TestMeta/execution/standard-018 (1.01s)
                --- FAIL: TestMeta/execution/standard-013 (0.45s)
                --- FAIL: TestMeta/execution/standard-014 (0.57s)
                --- PASS: TestMeta/execution/random-000 (1.74s)
                --- FAIL: TestMeta/execution/standard-015 (0.75s)
                --- FAIL: TestMeta/execution/standard-011 (0.62s)
                --- FAIL: TestMeta/execution/standard-012 (0.80s)
                --- FAIL: TestMeta/execution/standard-010 (0.68s)
                --- FAIL: TestMeta/execution/standard-007 (0.58s)
                --- FAIL: TestMeta/execution/standard-009 (0.72s)
                --- FAIL: TestMeta/execution/standard-008 (0.62s)
                --- FAIL: TestMeta/execution/standard-005 (0.56s)
                --- FAIL: TestMeta/execution/standard-006 (0.86s)
                --- FAIL: TestMeta/execution/standard-004 (0.67s)
                --- FAIL: TestMeta/execution/standard-002 (0.52s)
                --- FAIL: TestMeta/execution/standard-003 (0.59s)
                --- FAIL: TestMeta/execution/standard-001 (0.51s)
                --- FAIL: TestMeta/execution/random-013 (0.73s)
                --- FAIL: TestMeta/execution/random-021 (0.52s)
                --- FAIL: TestMeta/execution/random-025 (0.91s)
                --- FAIL: TestMeta/execution/random-024 (0.86s)
                --- FAIL: TestMeta/execution/random-023 (0.59s)
                --- FAIL: TestMeta/execution/random-019 (0.89s)
                --- PASS: TestMeta/execution/random-016 (3.48s)
                --- FAIL: TestMeta/execution/random-012 (0.08s)
                --- FAIL: TestMeta/execution/random-011 (0.25s)
                --- PASS: TestMeta/execution/random-015 (4.77s)
                --- PASS: TestMeta/execution/random-020 (5.44s)
                --- PASS: TestMeta/execution/random-008 (2.35s)
                --- PASS: TestMeta/execution/random-007 (6.89s)
                --- PASS: TestMeta/execution/random-010 (4.34s)
                --- PASS: TestMeta/execution/random-017 (8.69s)
                --- FAIL: TestMeta/execution/random-002 (0.56s)
                --- PASS: TestMeta/execution/random-009 (4.76s)
                --- PASS: TestMeta/execution/random-004 (4.39s)
                --- PASS: TestMeta/execution/random-005 (4.19s)
                --- PASS: TestMeta/execution/random-018 (12.65s)
                --- PASS: TestMeta/execution/random-001 (3.17s)
                --- PASS: TestMeta/execution/random-006 (5.37s)
                --- PASS: TestMeta/execution/random-022 (15.95s)
                --- PASS: TestMeta/execution/random-003 (19.30s)
                --- PASS: TestMeta/execution/random-014 (29.65s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=13947763229733006390-opts=random-011(sha=a30d64b3-seed=4179718345482882521-opts=random-005(sha=4ddacdaf-seed=4394513786269987793-opts=standard-005(<empty>))) (333.26s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@bananabrick
Copy link
Contributor

Oh true, WAL deletion should be unaffected. Yea, I think just allowing deletion of duplicate file should fix this. I'll take a look at the other failure and determine what happened there.

RaduBerinde added a commit that referenced this issue Feb 16, 2023
Return a NotExistError when the object is not in our list. This is
important for Remove, where such an error is ignored.

Informs #2325
@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ ebc640d19e50:

                --- FAIL: TestMeta/execution/random-016 (1.57s)
                --- FAIL: TestMeta/execution/random-018 (3.44s)
                --- PASS: TestMeta/execution/random-020 (6.40s)
                --- PASS: TestMeta/execution/random-025 (6.63s)
                --- PASS: TestMeta/execution/random-000 (6.66s)
                --- PASS: TestMeta/execution/random-015 (4.24s)
                --- FAIL: TestMeta/execution/random-011 (2.40s)
                --- PASS: TestMeta/execution/random-012 (3.32s)
                --- PASS: TestMeta/execution/random-017 (6.97s)
                --- PASS: TestMeta/execution/random-010 (4.47s)
                --- PASS: TestMeta/execution/random-014 (6.69s)
                --- PASS: TestMeta/execution/random-013 (5.64s)
                --- PASS: TestMeta/execution/random-021 (13.19s)
                --- PASS: TestMeta/execution/random-007 (4.44s)
                --- PASS: TestMeta/execution/random-003 (2.79s)
                --- PASS: TestMeta/execution/random-006 (4.41s)
                --- PASS: TestMeta/execution/random-002 (1.46s)
                --- PASS: TestMeta/execution/random-008 (6.01s)
                --- FAIL: TestMeta/execution/standard-013 (0.50s)
                --- FAIL: TestMeta/execution/standard-024 (0.52s)
                --- FAIL: TestMeta/execution/standard-025 (0.56s)
                --- FAIL: TestMeta/execution/standard-023 (0.55s)
                --- FAIL: TestMeta/execution/standard-020 (0.50s)
                --- PASS: TestMeta/execution/random-004 (4.17s)
                --- PASS: TestMeta/execution/random-005 (4.55s)
                --- FAIL: TestMeta/execution/standard-022 (0.62s)
                --- FAIL: TestMeta/execution/standard-017 (0.29s)
                --- FAIL: TestMeta/execution/standard-016 (0.32s)
                --- FAIL: TestMeta/execution/standard-019 (0.47s)
                --- FAIL: TestMeta/execution/standard-018 (0.45s)
                --- FAIL: TestMeta/execution/standard-015 (0.53s)
                --- FAIL: TestMeta/execution/standard-014 (0.45s)
                --- FAIL: TestMeta/execution/standard-021 (1.26s)
                --- FAIL: TestMeta/execution/standard-010 (0.28s)
                --- FAIL: TestMeta/execution/standard-011 (0.36s)
                --- FAIL: TestMeta/execution/standard-007 (0.53s)
                --- FAIL: TestMeta/execution/standard-012 (0.50s)
                --- FAIL: TestMeta/execution/standard-006 (0.21s)
                --- FAIL: TestMeta/execution/standard-005 (0.20s)
                --- FAIL: TestMeta/execution/standard-004 (0.25s)
                --- FAIL: TestMeta/execution/standard-008 (0.34s)
                --- FAIL: TestMeta/execution/standard-009 (0.38s)
                --- FAIL: TestMeta/execution/standard-002 (0.28s)
                --- FAIL: TestMeta/execution/standard-001 (0.22s)
                --- PASS: TestMeta/execution/random-001 (2.84s)
                --- FAIL: TestMeta/execution/standard-003 (0.43s)
                --- PASS: TestMeta/execution/random-009 (9.33s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=555731903540347248-opts=random-005(sha=a30d64b3-seed=4256056906429200930-opts=random-007(sha=4ddacdaf-seed=15416615420081063067-opts=standard-020(<empty>))) (200.47s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@bananabrick
Copy link
Contributor

bananabrick commented Feb 22, 2023

Yes, I checked the second failure, which looked different, but it's the same error in the logs:

meta_test.go:391:
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3180&linesState=3180)                  ===== SEED =====
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3181&linesState=3181)                  18183598825266961503
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3182&linesState=3182)                  ===== ERR =====
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3183&linesState=3183)                  exit status 1
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3184&linesState=3184)                  ===== OUT =====
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3185&linesState=3185)                  file 000472 (type 2) unknown to the provider
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3186&linesState=3186)  
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3187&linesState=3187)                  ===== OPTIONS =====
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3188&linesState=3188)                  [Version]
[05:13:36 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=3189&linesState=3189)                    pebble_version=0.1
05:13:36

But at the end of the logs I also see:

goroutine 34 [running]:
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99684&linesState=99684)  testing.tRunner.func1.2({0xc2c200, 0xc000032078})
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99685&linesState=99685)    GOROOT/src/testing/testing.go:1396 +0x24e
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99686&linesState=99686)  testing.tRunner.func1()
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99687&linesState=99687)    GOROOT/src/testing/testing.go:1399 +0x39f
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99688&linesState=99688)  panic({0xc2c200, 0xc000032078})
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99689&linesState=99689)    GOROOT/src/runtime/panic.go:884 +0x212
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99690&linesState=99690)  github.com/cockroachdb/pebble/internal/metamorphic.reorderHistory({0xc0002b3500, 0xfa, 0x0?})
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99691&linesState=99691)    github.com/cockroachdb/pebble/internal/metamorphic/external/com_github_cockroachdb_pebble/internal/metamorphic/history.go:158 +0x10a
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99692&linesState=99692)  github.com/cockroachdb/pebble/internal/metamorphic.CompareHistories(0x7ffdb17ee82d?, {0xc001fb0000, 0x1d4, 0x8?})
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99693&linesState=99693)    github.com/cockroachdb/pebble/internal/metamorphic/external/com_github_cockroachdb_pebble/internal/metamorphic/history.go:133 +0xec
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99694&linesState=99694)  internal/metamorphic/crossversion/crossversion_test.runCrossVersion({0xd74cb0, 0xc000132000}, 0xc0002a0d00, {0xc0002f23c0, 0x56}, {0xc0002fe000, 0x4, 0xdbf775?}, 0xf?, 0xa)
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99695&linesState=99695)    internal/metamorphic/crossversion/crossversion_test/external/com_github_cockroachdb_pebble/internal/metamorphic/crossversion/crossversion_test.go:175 +0x425
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99696&linesState=99696)  internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion(0xc0002a0d00)
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99697&linesState=99697)    internal/metamorphic/crossversion/crossversion_test/external/com_github_cockroachdb_pebble/internal/metamorphic/crossversion/crossversion_test.go:109 +0x1e5
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99698&linesState=99698)  testing.tRunner(0xc0002a0d00, 0xcbf260)
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99699&linesState=99699)    GOROOT/src/testing/testing.go:1446 +0x10b
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99700&linesState=99700)  created by testing.(*T).Run
[05:13:40 ](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_NightlyPebbleMetamorphicCrossversion/8710412?buildTab=log&focusLine=99701&linesState=99701)    GOROOT/src/testing/testing.go:1493 +0x35f

We hit a panic while trying to reorderHistory in metamorphic/history.go. I'm guessing that function isn't gracefully handling the case where the history file is incomplete? I'll have to look into why we wrote an incomplete history file(it might be incomplete because the test already failed), which I won't get to today or tomorrow.

The actual bug in the code is fixed, but we can leave this open until I investigate the problem in the test harness.

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ efd802f16697:

                --- PASS: TestMeta/execution/random-024 (0.75s)
                --- PASS: TestMeta/execution/standard-024 (0.71s)
                --- PASS: TestMeta/execution/standard-025 (0.75s)
                --- PASS: TestMeta/execution/standard-022 (0.58s)
                --- PASS: TestMeta/execution/standard-020 (0.46s)
                --- PASS: TestMeta/execution/standard-023 (1.01s)
                --- PASS: TestMeta/execution/random-023 (1.53s)
                --- PASS: TestMeta/execution/standard-017 (0.48s)
                --- PASS: TestMeta/execution/standard-019 (0.57s)
                --- PASS: TestMeta/execution/standard-018 (0.69s)
                --- PASS: TestMeta/execution/random-025 (1.78s)
                --- PASS: TestMeta/execution/standard-016 (0.67s)
                --- PASS: TestMeta/execution/standard-021 (1.30s)
                --- PASS: TestMeta/execution/standard-015 (0.71s)
                --- PASS: TestMeta/execution/random-006 (0.61s)
                --- PASS: TestMeta/execution/standard-014 (0.62s)
                --- PASS: TestMeta/execution/random-010 (0.62s)
                --- PASS: TestMeta/execution/random-011 (0.91s)
                --- PASS: TestMeta/execution/standard-007 (0.47s)
                --- PASS: TestMeta/execution/standard-012 (0.64s)
                --- PASS: TestMeta/execution/random-007 (0.85s)
                --- PASS: TestMeta/execution/standard-009 (0.61s)
                --- PASS: TestMeta/execution/random-009 (1.38s)
                --- PASS: TestMeta/execution/standard-011 (1.03s)
                --- PASS: TestMeta/execution/random-008 (1.37s)
                --- PASS: TestMeta/execution/standard-010 (0.92s)
                --- PASS: TestMeta/execution/standard-004 (0.72s)
                --- PASS: TestMeta/execution/standard-006 (0.68s)
                --- PASS: TestMeta/execution/standard-008 (0.93s)
                --- PASS: TestMeta/execution/standard-005 (0.46s)
                --- FAIL: TestMeta/execution/random-018 (0.41s)
                --- PASS: TestMeta/execution/standard-003 (0.55s)
                --- PASS: TestMeta/execution/standard-001 (0.55s)
                --- PASS: TestMeta/execution/random-016 (0.20s)
                --- PASS: TestMeta/execution/standard-002 (0.82s)
                --- PASS: TestMeta/execution/random-020 (0.66s)
                --- PASS: TestMeta/execution/random-022 (0.76s)
                --- PASS: TestMeta/execution/random-003 (0.55s)
                --- PASS: TestMeta/execution/random-021 (0.97s)
                --- PASS: TestMeta/execution/random-005 (0.64s)
                --- PASS: TestMeta/execution/random-004 (0.64s)
                --- PASS: TestMeta/execution/random-017 (1.00s)
                --- PASS: TestMeta/execution/random-015 (0.77s)
                --- PASS: TestMeta/execution/random-014 (0.73s)
                --- PASS: TestMeta/execution/random-002 (0.76s)
                --- PASS: TestMeta/execution/random-001 (0.68s)
                --- PASS: TestMeta/execution/random-019 (1.67s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=4901702894300824656-opts=random-011(sha=a30d64b3-seed=117942389438917464-opts=random-003(sha=4ddacdaf-seed=2922322098842544477-opts=random-000(<empty>))) (126.14s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@bananabrick
Copy link
Contributor

#2325 (comment)

^ This is a new failure.

@bananabrick
Copy link
Contributor

bananabrick commented Feb 22, 2023

There's a similar looking error, but instead of deleting a file twice, we're trying to open a deleted file:

                // INFO: [JOB 4] compacted(default) L0 [000352 000356 000357 000358 000362 000360] (7.8 K) + L6 [000369 000370 000371 000372 000373 000374 000375 000376 000377 000378 000379 000380 000381] (13 K) -> L6 [000385 000386 000387 000388 000389 000390 000391] (7.5 K), in 0.0s (0.0s total), output rate 314 K/s
                // FATAL: file 000385 (type 2) unknown to the objstorage provider: file does not exist
                // FATAL: checker failed with error: file 000385 (type 2) unknown to the objstorage provider: file does not exist
                // INFO: [JOB 9] compacting(default) L0 [000359] (1.1 K) + L6 [000385 000386 000387 000388] (4.1 K)
                // FATAL: file 000385 (type 2) unknown to the objstorage provider: file does not exist
                // INFO: [JOB 9] compaction(default) to L6 error: pebble: could not open table 000385: file 000385 (type 2) unknown to the objstorage provider: file does not exist
        === CONT  TestMeta/execution/random-016
        ````

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ 79eb94777b28:

                --- PASS: TestMeta/execution/standard-014 (4.45s)
                --- PASS: TestMeta/execution/random-007 (8.91s)
                --- PASS: TestMeta/execution/random-010 (5.72s)
                --- PASS: TestMeta/execution/standard-007 (3.64s)
                --- PASS: TestMeta/execution/standard-012 (4.48s)
                --- PASS: TestMeta/execution/standard-010 (4.60s)
                --- PASS: TestMeta/execution/standard-015 (15.21s)
                --- PASS: TestMeta/execution/standard-011 (4.96s)
                --- PASS: TestMeta/execution/standard-009 (4.44s)
                --- PASS: TestMeta/execution/random-009 (14.59s)
                --- PASS: TestMeta/execution/random-012 (16.05s)
                --- PASS: TestMeta/execution/standard-008 (5.88s)
                --- PASS: TestMeta/execution/random-006 (5.92s)
                --- FAIL: TestMeta/execution/random-002 (0.97s)
                --- PASS: TestMeta/execution/random-013 (24.43s)
                --- PASS: TestMeta/execution/standard-004 (3.51s)
                --- PASS: TestMeta/execution/standard-006 (3.76s)
                --- PASS: TestMeta/execution/random-003 (4.88s)
                --- PASS: TestMeta/execution/random-005 (8.37s)
                --- PASS: TestMeta/execution/standard-005 (3.85s)
                --- PASS: TestMeta/execution/random-020 (4.91s)
                --- PASS: TestMeta/execution/random-008 (10.85s)
                --- PASS: TestMeta/execution/standard-019 (3.06s)
                --- PASS: TestMeta/execution/random-023 (7.87s)
                --- PASS: TestMeta/execution/random-022 (6.05s)
                --- PASS: TestMeta/execution/random-024 (9.15s)
                --- FAIL: TestMeta/execution/random-019 (1.54s)
                --- PASS: TestMeta/execution/random-021 (6.86s)
                --- PASS: TestMeta/execution/random-018 (3.77s)
                --- PASS: TestMeta/execution/standard-025 (3.09s)
                --- PASS: TestMeta/execution/random-015 (3.39s)
                --- PASS: TestMeta/execution/standard-022 (2.45s)
                --- PASS: TestMeta/execution/standard-023 (3.01s)
                --- PASS: TestMeta/execution/random-017 (8.64s)
                --- PASS: TestMeta/execution/random-004 (27.72s)
                --- PASS: TestMeta/execution/standard-024 (2.42s)
                --- PASS: TestMeta/execution/standard-016 (2.20s)
                --- PASS: TestMeta/execution/random-025 (18.89s)
                --- PASS: TestMeta/execution/standard-018 (2.03s)
                --- PASS: TestMeta/execution/standard-002 (1.35s)
                --- PASS: TestMeta/execution/random-016 (5.84s)
                --- PASS: TestMeta/execution/random-014 (2.05s)
                --- PASS: TestMeta/execution/standard-003 (1.60s)
                --- PASS: TestMeta/execution/random-001 (12.15s)
                --- PASS: TestMeta/execution/standard-001 (1.45s)
                --- PASS: TestMeta/execution/standard-021 (5.02s)
                --- PASS: TestMeta/execution/random-011 (43.12s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=16859323255072942675-opts=random-000(sha=a30d64b3-seed=1322207626770058944-opts=random-009(sha=4ddacdaf-seed=692738856008268077-opts=standard-017(<empty>))) (603.05s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@bananabrick
Copy link
Contributor

Haven't had a chance to investigate the new failure. Will look at it once I respond to the review on: #2352.

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ bc4c9afe47a5:

                --- PASS: TestMeta/execution/standard-003 (0.66s)
                --- PASS: TestMeta/execution/random-000 (0.69s)
                --- PASS: TestMeta/execution/standard-002 (0.49s)
                --- PASS: TestMeta/execution/standard-009 (0.74s)
                --- PASS: TestMeta/execution/standard-006 (0.71s)
                --- PASS: TestMeta/execution/standard-001 (0.88s)
                --- PASS: TestMeta/execution/random-025 (1.47s)
                --- PASS: TestMeta/execution/random-024 (0.85s)
                --- PASS: TestMeta/execution/random-020 (1.01s)
                --- PASS: TestMeta/execution/random-019 (1.00s)
                --- PASS: TestMeta/execution/random-018 (0.89s)
                --- PASS: TestMeta/execution/random-017 (1.08s)
                --- PASS: TestMeta/execution/random-015 (0.68s)
                --- FAIL: TestMeta/execution/random-016 (1.01s)
                --- PASS: TestMeta/execution/standard-020 (0.85s)
                --- PASS: TestMeta/execution/random-014 (1.12s)
                --- PASS: TestMeta/execution/standard-025 (0.61s)
                --- PASS: TestMeta/execution/standard-023 (0.38s)
                --- PASS: TestMeta/execution/random-023 (3.23s)
                --- PASS: TestMeta/execution/standard-024 (0.62s)
                --- PASS: TestMeta/execution/standard-022 (0.46s)
                --- PASS: TestMeta/execution/random-022 (3.15s)
                --- PASS: TestMeta/execution/random-013 (3.42s)
                --- PASS: TestMeta/execution/standard-005 (0.27s)
                --- PASS: TestMeta/execution/standard-012 (0.27s)
                --- PASS: TestMeta/execution/standard-017 (0.40s)
                --- PASS: TestMeta/execution/standard-019 (0.39s)
                --- PASS: TestMeta/execution/standard-018 (0.36s)
                --- PASS: TestMeta/execution/random-007 (0.38s)
                --- PASS: TestMeta/execution/random-009 (0.22s)
                --- PASS: TestMeta/execution/random-008 (0.32s)
                --- PASS: TestMeta/execution/random-012 (0.68s)
                --- PASS: TestMeta/execution/standard-011 (0.33s)
                --- PASS: TestMeta/execution/standard-008 (0.30s)
                --- PASS: TestMeta/execution/random-021 (3.73s)
                --- PASS: TestMeta/execution/standard-016 (0.22s)
                --- PASS: TestMeta/execution/random-011 (0.92s)
                --- PASS: TestMeta/execution/standard-021 (1.53s)
                --- PASS: TestMeta/execution/random-006 (0.32s)
                --- PASS: TestMeta/execution/standard-015 (0.48s)
                --- PASS: TestMeta/execution/random-004 (0.44s)
                --- PASS: TestMeta/execution/random-005 (0.37s)
                --- PASS: TestMeta/execution/standard-014 (0.31s)
                --- PASS: TestMeta/execution/random-003 (0.23s)
                --- PASS: TestMeta/execution/random-002 (0.33s)
                --- PASS: TestMeta/execution/random-010 (1.12s)
                --- PASS: TestMeta/execution/random-001 (0.27s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=13698007444407272884-opts=random-007(sha=a30d64b3-seed=13745565025753919460-opts=random-007(sha=4ddacdaf-seed=16389930109088582515-opts=random-007(<empty>))) (79.16s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@jbowens
Copy link
Collaborator

jbowens commented Feb 24, 2023

Another missing sstable. This is concerning. We should identify this soon. @bananabrick you still want to take it, or you can punt it to me or maybe @RaduBerinde given the objstorage context.

        === CONT  TestMeta/execution/random-016
            meta_test.go:391: 
                ===== SEED =====
                4671948031657383085
                ===== ERR =====
                exit status 1
                ===== OUT =====
                Seed: 0
                dbRestartOp: L5: 001988: file 001988 (type 2) unknown to the objstorage provider: file does not exist

@bananabrick
Copy link
Contributor

@jbowens I'm looking now.

@RaduBerinde
Copy link
Member

Probably caused by #2356, but not sure why.

@bananabrick
Copy link
Contributor

bananabrick commented Feb 24, 2023

@RaduBerinde Yea, it started failing when #2347 was merged.

Posting what I gathered after a first look at the logs.

[07:47:57] :	 [Step 1/1]                 // INFO: [JOB 4] compacting(default) L0 [000352 000356 000357 000358 000362 000360] (7.8 K) + L6 [000369 000370 000371 000372 000373 000374 000375 000376 000377 000378 000379 000380 000381] (13 K)
[07:47:57] :	 [Step 1/1]                 // INFO: [JOB 4] compacting: sstable created 000385

We see an sstable created message for sstable 385 as part of an L0->L6 compaction: // INFO: [JOB 4] compacting: sstable created 000385

 // INFO: [JOB 4] compacted(default) L0 [000352 000356 000357 000358 000362 000360] (7.8 K) + L6 [000369 000370 000371 000372 000373 000374 000375 000376 000377 000378 000379 000380 000381] (13 K) -> L6 [000385 000386 000387 000388 000389 000390 000391] (7.5 K), in 0.0s (0.0s total), output rate 314 K/s

After that we delete sstable 385 as part of the scan/delete obsolete files step in Open. I know this because the file was deleted with job id = 1.

After that another compaction job 9, tries to read file 385 from L6, which is where the error comes from.

// INFO: [JOB 9] compacting(default) L0 [000359] (1.1 K) + L6 [000385 000386 000387 000388] (4.1 K)

We know that file 385 must exist in L6 as job 4 created it, but the scan obsolete step is probably incorrectly marking that file as obsolete. This lines up with the changes in #2347. I don't know why we're incorrectly marking a file as obsolete though.

@RaduBerinde
Copy link
Member

OOC, how did you get to those logs? I've been looking around through the artifacts but I'm a little lost.

@bananabrick
Copy link
Contributor

You can click on failed in the failure message, and then click on Download log from the team city page.

@bananabrick
Copy link
Contributor

@RaduBerinde It wasn't #2356 when the failure started, but #2347.

@RaduBerinde
Copy link
Member

In the latest failure log I see:

[06:33:59] :	 [Step 1/1]                 // INFO: [JOB 1] sstable deleted 001988

then

[06:33:59] :	 [Step 1/1]                 // INFO: [JOB 13] compacting(move) L3 [001988] (2.2 K) + L4 [] (0 B)

We probably need to List the provider before gathering the live nums?

@bananabrick
Copy link
Contributor

This might be an existing race which just surfaced because of #2347. We're probably not marking file 385 as live in the scanObsoleteFiles step. It didn't surface previously because we weren't using up to date contents of the directory while scanning for obsolete files prior to #2347.

@RaduBerinde
Copy link
Member

We probably need to List the provider before gathering the live nums?

Oh.. but we are stopping compactions and waiting for any to complete, so there shouldn't be any changes to the live set or the files. And this is during Open so nothing else is going on.

@bananabrick
Copy link
Contributor

We probably need to List the provider before gathering the live nums?

The live file nums are determined through the latest version: https://github.com/cockroachdb/pebble/blob/master/version_set.go. So, 385 theoretically should be added to the liveFileNums

@bananabrick
Copy link
Contributor

Might be easier to reproduce this first. I'll try and do that.

@RaduBerinde
Copy link
Member

RaduBerinde commented Feb 24, 2023

Shouldn't this be a loop?

	// Wait for any ongoing compaction to complete before continuing.
	if d.mu.compact.compactingCount > 0 || d.mu.compact.flushing {
		d.mu.compact.cond.Wait()
	}

In the log I'm looking at, there are two compactions going on right before the delete.

@bananabrick
Copy link
Contributor

bananabrick commented Feb 24, 2023

Oh. I think that's right. Racing with compactions perfectly explains the bug. We're reacquiring the lock, but not ensuring that the condition is met.

If there's two compactions going on, then one of them will Broadcast to the condition variable, and we'll proceed with the scan step even when the compactingCount > 0.

@bananabrick
Copy link
Contributor

This makes sense to me and will probably fix the bug, but we should be a bit more principled and try and reproduce the bug to confirm for sure.

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ fccb83b60d5c:

                --- PASS: TestMeta/execution/random-019 (5.88s)
                --- PASS: TestMeta/execution/random-012 (6.23s)
                --- PASS: TestMeta/execution/random-008 (2.92s)
                --- PASS: TestMeta/execution/random-009 (3.95s)
                --- PASS: TestMeta/execution/random-006 (4.52s)
                --- PASS: TestMeta/execution/random-003 (4.96s)
                --- PASS: TestMeta/execution/random-005 (5.56s)
                --- PASS: TestMeta/execution/random-025 (11.84s)
                --- PASS: TestMeta/execution/random-002 (5.65s)
                --- PASS: TestMeta/execution/random-007 (7.68s)
                --- PASS: TestMeta/execution/standard-025 (2.26s)
                --- PASS: TestMeta/execution/random-000 (2.52s)
                --- PASS: TestMeta/execution/random-001 (4.98s)
                --- PASS: TestMeta/execution/random-011 (15.51s)
                --- PASS: TestMeta/execution/standard-024 (3.79s)
                --- PASS: TestMeta/execution/standard-022 (2.25s)
                --- PASS: TestMeta/execution/standard-020 (2.18s)
                --- PASS: TestMeta/execution/standard-023 (3.08s)
                --- PASS: TestMeta/execution/random-016 (3.03s)
                --- PASS: TestMeta/execution/random-021 (3.26s)
                --- PASS: TestMeta/execution/random-014 (4.00s)
                --- PASS: TestMeta/execution/random-018 (4.32s)
                --- FAIL: TestMeta/execution/random-013 (1.13s)
                --- PASS: TestMeta/execution/random-017 (4.76s)
                --- PASS: TestMeta/execution/random-015 (4.73s)
                --- PASS: TestMeta/execution/standard-010 (2.51s)
                --- PASS: TestMeta/execution/standard-021 (8.97s)
                --- PASS: TestMeta/execution/standard-017 (2.46s)
                --- PASS: TestMeta/execution/standard-018 (2.53s)
                --- PASS: TestMeta/execution/random-020 (3.55s)
                --- PASS: TestMeta/execution/standard-016 (3.04s)
                --- PASS: TestMeta/execution/random-004 (17.88s)
                --- PASS: TestMeta/execution/standard-011 (1.01s)
                --- PASS: TestMeta/execution/standard-014 (1.91s)
                --- PASS: TestMeta/execution/standard-013 (2.20s)
                --- PASS: TestMeta/execution/standard-012 (2.23s)
                --- PASS: TestMeta/execution/standard-009 (1.13s)
                --- PASS: TestMeta/execution/standard-005 (1.68s)
                --- PASS: TestMeta/execution/standard-006 (0.66s)
                --- PASS: TestMeta/execution/standard-007 (1.31s)
                --- PASS: TestMeta/execution/random-022 (7.57s)
                --- PASS: TestMeta/execution/standard-008 (1.77s)
                --- PASS: TestMeta/execution/standard-003 (0.87s)
                --- PASS: TestMeta/execution/standard-002 (0.68s)
                --- PASS: TestMeta/execution/standard-004 (0.88s)
                --- PASS: TestMeta/execution/standard-015 (3.85s)
                --- PASS: TestMeta/execution/standard-001 (0.54s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=2360281332144119816-opts=random-006(sha=a30d64b3-seed=13384159670710577582-opts=random-005(sha=4ddacdaf-seed=13752137610437225990-opts=standard-004(<empty>))) (295.49s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ fccb83b60d5c:

                --- PASS: TestMeta/execution/random-025 (3.01s)
                --- PASS: TestMeta/execution/random-000 (3.90s)
                --- PASS: TestMeta/execution/random-023 (3.96s)
                --- PASS: TestMeta/execution/random-010 (1.90s)
                --- PASS: TestMeta/execution/random-012 (2.38s)
                --- PASS: TestMeta/execution/random-007 (2.59s)
                --- PASS: TestMeta/execution/random-009 (1.91s)
                --- PASS: TestMeta/execution/random-011 (2.54s)
                --- PASS: TestMeta/execution/standard-013 (1.61s)
                --- PASS: TestMeta/execution/random-019 (2.24s)
                --- PASS: TestMeta/execution/standard-024 (2.22s)
                --- PASS: TestMeta/execution/standard-025 (2.55s)
                --- PASS: TestMeta/execution/random-016 (2.98s)
                --- PASS: TestMeta/execution/standard-023 (1.70s)
                --- PASS: TestMeta/execution/standard-022 (1.84s)
                --- PASS: TestMeta/execution/standard-019 (1.28s)
                --- PASS: TestMeta/execution/standard-020 (1.59s)
                --- PASS: TestMeta/execution/random-008 (5.20s)
                --- PASS: TestMeta/execution/standard-018 (1.82s)
                --- PASS: TestMeta/execution/standard-016 (1.56s)
                --- FAIL: TestMeta/execution/random-018 (0.67s)
                --- PASS: TestMeta/execution/standard-017 (1.89s)
                --- PASS: TestMeta/execution/standard-014 (1.72s)
                --- PASS: TestMeta/execution/random-021 (10.80s)
                --- PASS: TestMeta/execution/random-002 (2.30s)
                --- PASS: TestMeta/execution/random-005 (2.66s)
                --- PASS: TestMeta/execution/standard-015 (4.60s)
                --- PASS: TestMeta/execution/standard-021 (7.50s)
                --- PASS: TestMeta/execution/random-003 (2.49s)
                --- PASS: TestMeta/execution/random-001 (1.54s)
                --- PASS: TestMeta/execution/standard-007 (2.14s)
                --- PASS: TestMeta/execution/standard-010 (0.87s)
                --- PASS: TestMeta/execution/random-014 (2.75s)
                --- PASS: TestMeta/execution/standard-011 (1.18s)
                --- PASS: TestMeta/execution/standard-012 (1.31s)
                --- PASS: TestMeta/execution/random-015 (12.57s)
                --- PASS: TestMeta/execution/standard-004 (0.88s)
                --- PASS: TestMeta/execution/standard-008 (0.91s)
                --- PASS: TestMeta/execution/random-006 (6.09s)
                --- PASS: TestMeta/execution/standard-006 (0.85s)
                --- PASS: TestMeta/execution/standard-009 (1.12s)
                --- PASS: TestMeta/execution/standard-005 (1.01s)
                --- PASS: TestMeta/execution/random-004 (6.65s)
                --- PASS: TestMeta/execution/standard-003 (0.53s)
                --- PASS: TestMeta/execution/standard-001 (0.62s)
                --- PASS: TestMeta/execution/standard-002 (0.85s)
                --- PASS: TestMeta/execution/random-017 (0.77s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=3267641216938305178-opts=random-000(sha=a30d64b3-seed=2188273758598246212-opts=random-012(sha=4ddacdaf-seed=8255326271271415457-opts=random-004(<empty>))) (170.46s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

internal/metamorphic/crossversion/crossversion_test.TestMetaCrossVersion failed with artifacts on refs/heads/master @ 02d779ffbc46:

                --- PASS: TestMeta/execution/random-025 (9.46s)
                --- PASS: TestMeta/execution/random-007 (10.38s)
                --- PASS: TestMeta/execution/random-009 (5.92s)
                --- PASS: TestMeta/execution/random-010 (7.31s)
                --- PASS: TestMeta/execution/random-008 (6.93s)
                --- PASS: TestMeta/execution/random-011 (10.86s)
                --- PASS: TestMeta/execution/random-006 (5.81s)
                --- PASS: TestMeta/execution/random-004 (7.13s)
                --- PASS: TestMeta/execution/random-002 (6.15s)
                --- PASS: TestMeta/execution/random-005 (7.22s)
                --- PASS: TestMeta/execution/standard-013 (4.66s)
                --- PASS: TestMeta/execution/standard-025 (5.35s)
                --- PASS: TestMeta/execution/random-023 (21.97s)
                --- PASS: TestMeta/execution/random-003 (6.82s)
                --- PASS: TestMeta/execution/standard-024 (6.75s)
                --- PASS: TestMeta/execution/standard-023 (5.40s)
                --- PASS: TestMeta/execution/standard-022 (4.87s)
                --- PASS: TestMeta/execution/standard-019 (3.72s)
                --- PASS: TestMeta/execution/standard-020 (4.46s)
                --- PASS: TestMeta/execution/standard-018 (4.04s)
                --- PASS: TestMeta/execution/standard-017 (5.15s)
                --- PASS: TestMeta/execution/standard-014 (4.42s)
                --- PASS: TestMeta/execution/standard-016 (5.06s)
                --- PASS: TestMeta/execution/random-020 (30.14s)
                --- PASS: TestMeta/execution/standard-007 (4.19s)
                --- PASS: TestMeta/execution/standard-012 (5.58s)
                --- PASS: TestMeta/execution/standard-011 (4.22s)
                --- PASS: TestMeta/execution/standard-009 (3.98s)
                --- PASS: TestMeta/execution/standard-010 (4.70s)
                --- PASS: TestMeta/execution/standard-004 (6.38s)
                --- PASS: TestMeta/execution/standard-006 (4.67s)
                --- PASS: TestMeta/execution/standard-005 (4.22s)
                --- PASS: TestMeta/execution/standard-021 (17.35s)
                --- FAIL: TestMeta/execution/random-016 (1.42s)
                --- PASS: TestMeta/execution/random-017 (5.84s)
                --- PASS: TestMeta/execution/standard-008 (10.27s)
                --- PASS: TestMeta/execution/standard-015 (15.53s)
                --- PASS: TestMeta/execution/random-001 (3.74s)
                --- PASS: TestMeta/execution/standard-002 (2.48s)
                --- PASS: TestMeta/execution/standard-003 (2.13s)
                --- PASS: TestMeta/execution/standard-001 (1.90s)
                --- PASS: TestMeta/execution/random-014 (4.62s)
                --- PASS: TestMeta/execution/random-013 (1.58s)
                --- PASS: TestMeta/execution/random-018 (7.60s)
                --- PASS: TestMeta/execution/random-015 (9.24s)
                --- PASS: TestMeta/execution/random-019 (11.18s)
                --- PASS: TestMeta/execution/random-000 (4.98s)
        FAIL
        
        --- FAIL: TestMetaCrossVersion/execution-master/sha=744ea7cc-seed=14154224165785949611-opts=random-001(sha=a30d64b3-seed=4070043905243588670-opts=random-006(sha=4ddacdaf-seed=2639450340510573884-opts=random-009(<empty>))) (509.87s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@bananabrick
Copy link
Contributor

I'm going to close this out as this will most likely be fixed by #2360.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants