Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply: fix initial snapshot assertion #131503

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Sep 27, 2024

The broad assertion here is that all the entries composing a snapshot must have been applied by someone before (at least the sender of the snapshot has done it, otherwise it couldn’t have sent this snapshot).

And then there is an additional assertion (which failed occasionally before this PR) that there can’t be a snapshot without at least one entry applied. This is generally true, except that there are entries not registered by the "apply" stack: particularly, empty entries that raft leader appends at the start of its term. There is a comment to this extent in the asserter.go code:

// INVARIANT: all commands have a command ID. etcd/raft may commit noop
// proposals on leader changes that do not have a command ID, but we skip
// these during application.
if len(cmdID) == 0 {
fail("applied command without ID")
}

This commit modifies the snapshot assertion to correctly handle this case. It is possible that an initial snapshot sent from leader to followers contains only this dummy entry, and there were no "real" proposals applied.

Fixes the false assertion failure in #118471 (comment)
Related to #116319

@pav-kv pav-kv requested review from tbg and arulajmani September 27, 2024 12:33
@pav-kv pav-kv requested a review from a team as a code owner September 27, 2024 12:33
Copy link

blathers-crl bot commented Sep 27, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv added backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Sep 27, 2024
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @tbg)


-- commits line 3 at r1:
This commit message deserves some words about the previous assumption, why it was faulty, and the fix. You already had this in Slack from earlier today, but it's worth capturing here as well.


pkg/kv/kvserver/apply/asserter.go line 392 at r1 (raw file):

	// is an initial snapshot. It's possible that the initial snapshot follows an
	// empty entry appended by the raft leader at the start of this term. Since we
	// don't register this entry as applied, r.log can be empty here.

nit: Consider adding more words to this explanation and pointing to

// INVARIANT: all commands have a command ID. etcd/raft may commit noop
// proposals on leader changes that do not have a command ID, but we skip
// these during application.
if len(cmdID) == 0 {
fail("applied command without ID")
}
here.

The broad assertion here is that all the entries composing a snapshot
must have been applied by someone before (at least the sender of the
snapshot has done it, otherwise it couldn’t have sent this snapshot).

And then there is an additional assertion (which failed occasionally
before this PR) that there can’t be a snapshot without at least one
entry applied. This is generally true, except that there are entries not
registered by the "apply" stack: particularly, empty entries that raft
leader appends at the start of its term. There is a comment to this
extent in the `asserter.go` code:

https://github.com/cockroachdb/cockroach/blob/5b0371b44009ac2ae3f58ea7ed95b290dd4e8227/pkg/kv/kvserver/apply/asserter.go#L254-L259

This commit modifies the snapshot assertion to correctly handle this
case. It is possible that an initial snapshot sent from leader to
followers contains only this dummy entry, and there were no "real"
proposals applied.

Epic: none
Release note: none
@pav-kv pav-kv force-pushed the fix-initial-snap-assert branch from 8a1aba5 to 5aeac1a Compare September 27, 2024 16:15
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @tbg)


-- commits line 3 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

This commit message deserves some words about the previous assumption, why it was faulty, and the fix. You already had this in Slack from earlier today, but it's worth capturing here as well.

Done


pkg/kv/kvserver/apply/asserter.go line 392 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: Consider adding more words to this explanation and pointing to

// INVARIANT: all commands have a command ID. etcd/raft may commit noop
// proposals on leader changes that do not have a command ID, but we skip
// these during application.
if len(cmdID) == 0 {
fail("applied command without ID")
}
here.

Done

@pav-kv
Copy link
Collaborator Author

pav-kv commented Sep 27, 2024

bors r=arulajmani

@craig craig bot merged commit 9d3d6d9 into cockroachdb:master Sep 27, 2024
22 of 23 checks passed
@pav-kv pav-kv deleted the fix-initial-snap-assert branch September 27, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants