Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/bank/cluster-recovery failed #60828

Closed
cockroach-teamcity opened this issue Feb 20, 2021 · 16 comments
Closed

roachtest: acceptance/bank/cluster-recovery failed #60828

cockroach-teamcity opened this issue Feb 20, 2021 · 16 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).acceptance/bank/cluster-recovery failed on master@3851e17ac987cbd7bdc8944274c4a5705a6c2f63:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998389, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 20, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@8b6f3c84cc256debeeb4d4055c5f0d5c9a481213:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998365, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@64c4aef909f4382523cd9248341ca9f4448d841a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998324, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@d485973754bd65c2b504b47d0c22a512b6ce00e0:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 997678, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@ajwerner
Copy link
Contributor

This seems like a rather severe correctness problem. @nvanbenschoten any guesses?

@nvanbenschoten
Copy link
Member

I was just looking to see if I could repro. I was concerned with the timing that it was fallout from #59566, but that was not merged by the first time this failed.

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@a2d8d2008bf567bc1bb27a7e87dff27750cef7a3:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998964, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@6a95063b141824c8d98f248c1af87b1bd55bf435:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998395, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

@nvanbenschoten
Copy link
Member

500+ local iterations on my Mac across various SHAs and nothing. Then I throw this on a gceworker and hit it 7 of 10 times...

Anyway, once I got this reproducing, it was quick to bisect to d8c3eef. @sumeerbhola do you mind taking a look at this? It appears that cockroachdb/pebble@959663f is causing issues.

In the meantime, I'm going to revert d8c3eef on master.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 22, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/bank/cluster-recovery failed on master@6a95063b141824c8d98f248c1af87b1bd55bf435:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/bank/cluster-recovery/run_1
	bank.go:197,bank.go:430,acceptance.go:112,test_runner.go:767: the bank is not in good order, total value: 998127, expected: 998001

More

Artifacts: /acceptance/bank/cluster-recovery

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Feb 22, 2021
58422: changefeedccl: support primary key changes r=ajwerner a=ajwerner

This PR does a few things. One is that when errors occur due to unsupported
schema changes during the execution of a changefeed, relatively poor handling
ensues. Ideally we'd allow the changefeed to run its course right up to that
unsupported schema change's timestamp and then ensure that we persist the fact
that we've processed all of that data. That would permit a user to then restart
a changefeed after the unsupported change. There are some edge cases here worth
considering related to off-by-ones in the timestamp management. I probably
should go through that exercise before merging this PR.

The real feature this work is in support of is to allow for changefeeds to
successfully navigate changes to a primary index.

This PR works and support changes to the primary key of a table that also
include column set changes.

Release note (enterprise change): Support primary key changes in `CHANGEFEED`.

60825: kv/kvclient: fix ManualRefresh error handling r=ajwerner a=nvanbenschoten

Fixes #60760.
Fallout from #60567.

The refreshed BatchRequest was nil on the error path, which was
resulting in a nil-pointer exception. This commit fixes this by
passing the original BatchRequest to updateStateLocked, like the
TxnCoordSender normally does.

60908: Revert "vendor: bump pebble to 959663f8" r=nvanbenschoten a=nvanbenschoten

Informs #60828.

This reverts commit d8c3eef.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@sumeerbhola
Copy link
Collaborator

There is a bug in the SeekLT optimization which incorrectly uses <= instead of <. Very surprising that nothing else failed apart from making this roachtest flaky.
Fix is in cockroachdb/pebble#1069

@knz
Copy link
Contributor

knz commented Feb 24, 2021

Presumably fixed by #60992. Please reopen if not.

@knz knz closed this as completed Feb 24, 2021
@knz
Copy link
Contributor

knz commented Feb 24, 2021

oh andrei says it's unrelated. KEeping open.

@knz knz reopened this Feb 24, 2021
@sumeerbhola
Copy link
Collaborator

oh andrei says it's unrelated. KEeping open.

@knz @andreimatei are you saying that this test fails despite reverting the Pebble optimization which happened in #60908? If yes, can you provide a pointer to such failures?

@nvanbenschoten
Copy link
Member

I think there was confusion in the kv slack channel. This did not fail again after revering the Pebble optimization, or at least, that's not why it was closed and re-opened.

@rafiss
Copy link
Collaborator

rafiss commented Mar 2, 2021

Who is able to resolve the confusion on this? My read is that cockroachdb/pebble#1069 fixed this and it hasn't been failing since.

@nvanbenschoten
Copy link
Member

Yeah, we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

6 participants