Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Deflake TestReplicaReproposalWithNewLeaseIndexError #40477

Merged
merged 1 commit into from
Sep 4, 2019

Conversation

irfansharif
Copy link
Contributor

Fixes #39739. When inserting a new command into the proposal buffer, we
first reserve an index into the buffer's array and an offset from the
buffer's base lease index. If we subsequently fail to insert the
proposal, we should undo the index and offset reservation.

When we error out in (*propBuf).Insert, if we don't undo the index
reservation we expect a proposal at said index during consumption, but
don't find one.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

Other than the specific test, we only really need this for an error when marshaling the command footer (protoutil.MarshalToWithoutFuzzing). This feels too heavy handed for just that, but I'm not sure.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job tracking this down. Failing to insert into the buffer when we reserved an index in the array seems like an issue. Yet, I share your concern about the fix here. Subtracting from the counter does feel pretty heavy-handed. You had to be careful about only subtracting from the array index and not the lease index offset, which I suspect was because doing so would violate strict monotonicity and allow duplicate max lease index assignments. I'm also worried about this causing deadlocks in handleCounterRequestRLocked if it ever allowed zero or more than one request to hit the "The buffer is full and we were the first request to notice" path. I haven't been able to convince myself that that's actually an issue though.

I'm wondering if it would have been sufficient to just ignore empty array elements when flushing the buffer. Would that work to address the cancellation path after a proposal has acquired an array index?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a nil check when iterating over the buffer works just fine, I only didn't earlier because it seemed smellier but this ended up being worse. I've reverted to that now, PTAL (also alleviates any worries around handleCounterRequestRLocked deadlocking). I kept the tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: though I don't think you meant to include any of the .eg.go files.

Reviewed 3 of 3 files at r1, 1 of 9 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @irfansharif)


pkg/storage/replica_proposal_buf_test.go, line 273 at r1 (raw file):

	b.Init(&p)

	num := propBufArrayMinSize

nit: any reason to keep this variable around? It would be worth it if you defined num := propBufArrayMinSize / 2.

@irfansharif irfansharif force-pushed the proposer-buf-nil branch 2 times, most recently from 054ef0a to e94d515 Compare September 4, 2019 19:43
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, TFTR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @nvanbenschoten)

Fixes cockroachdb#39739. When inserting a new command into the proposal buffer, we
first reserve an index into the buffer's array. If we subsequently fail
to insert the proposal, we should account for this during proposal
registration.

Additionally add `TestPropBufCnt` to test basic behaviour of the counter
maintained by the proposal buffer.

Release note: None
@irfansharif
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Sep 4, 2019
40477: storage: Deflake TestReplicaReproposalWithNewLeaseIndexError r=irfansharif a=irfansharif

Fixes #39739. When inserting a new command into the proposal buffer, we
first reserve an index into the buffer's array and an offset from the
buffer's base lease index. If we subsequently fail to insert the
proposal, we should undo the index and offset reservation.

When we error out in `(*propBuf).Insert`, if we don't undo the index
reservation we expect a proposal at said index during consumption, but
don't find one.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@craig
Copy link
Contributor

craig bot commented Sep 4, 2019

Build succeeded

@craig craig bot merged commit a84f0f4 into cockroachdb:master Sep 4, 2019
@irfansharif irfansharif deleted the proposer-buf-nil branch November 18, 2019 18:19
@irfansharif irfansharif restored the proposer-buf-nil branch November 18, 2019 18:19
@irfansharif irfansharif deleted the proposer-buf-nil branch November 18, 2019 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

teamcity: failed test: TestReplicaReproposalWithNewLeaseIndexError
3 participants