-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.2: sql: don't parallelize *any* FK and unique checks containing locking #118931
release-23.2: sql: don't parallelize *any* FK and unique checks containing locking #118931
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
No: The first commit is a small refactor of The second commit is behind the The third commit is a rename of some constants.
There is a small risk that the first commit (the refactor) breaks something that depends on the flags in
Due to
If we don't backport, there are two workarounds to allow affected UPDATE statements to run:
I believe it is worth the small risk of backporting the refactor in the first commit to make sure customers have a good experience with read committed isolation in 23.2.
The third commit (rename of some constants) could be skipped. I've included it to reduce future backport skew, but it doesn't have to be included. Curious to know what people think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the third commit and I think it's fine to backport. I left a few optional comments that you can consider in the future - no need to change anything in this PR.
Reviewed 15 of 15 files at r1, 2 of 2 files at r2, 7 of 7 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2, @nvanbenschoten, and @yuzefovich)
pkg/sql/distsql_running.go
line 2103 at r2 (raw file):
return false } cp := cascadePlan.(*planComponents)
🤔 this comment could probably use an update:
cockroach/pkg/sql/opt/exec/factory.go
Line 34 in 7ad0187
// Plan represents the plan for a query (currently maps to sql.planTop). |
pkg/sql/opt/exec/factory.go
line 41 at r1 (raw file):
// PlanFlags tracks various properties of the built plan. type PlanFlags uint32
nit: Maybe a better name is PlanProperties
or PlanProps
? It won't match the sql.planFlags
naming, though.
pkg/sql/opt/exec/factory.go
line 80 at r1 (raw file):
func (pf PlanFlags) IsSet(flag PlanFlags) bool { return (pf & flag) != 0
super nit: I don't think we're terribly consistent with this, but I think it's safer to do return pf&flag == flag
. That way, if flag
has multiple bits set, IsSet
will only return true if pf
has all of those bits set rather than returning true if it has one bit set.
6dfcf75
to
741c335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @nvanbenschoten, and @yuzefovich)
pkg/sql/opt/exec/factory.go
line 80 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
super nit: I don't think we're terribly consistent with this, but I think it's safer to do
return pf&flag == flag
. That way, ifflag
has multiple bits set,IsSet
will only return true ifpf
has all of those bits set rather than returning true if it has one bit set.
Good call! This inspired me to perform an audit of all such helper functions: #119035
Execbuild produces a `sql.planComponents` containing the main query plan, plus plans for any subqueries, checks, and cascades. In the normal SQL code path (`sql.(*optPlanningCtx).runExecBuilder`) we copy this planComponents into a `sql.planTop`, which then is decorated with extra information gathered from the Builder. However, there are other users of execbuild that only work with the planComponents, and don't have a planTop. In the next commit, one such user, `PlanAndRunCascadesAndChecks`, needs to see some of the plan flags set when building FK check plans, but doesn't have access to the Builder which gathered those plans. To solve this, this commit moves plan flags from the planTop to planComponents, and then refactors the execbuilder and exec factory to set some of these flags when creating the planComponents. Informs: cockroachdb#118720 Epic: None Release note: None
In cockroachdb#111896 we disallowed parallel execution of FK and unique checks that contain locking, to avoid usage of LeafTxns. But cockroachdb#111896 only considered checks built during planning of the main query, and overlooked checks built during planning of FK cascades. This commit also considers checks built during planning of FK cascades. Fixes: cockroachdb#118720 Epic: None Release note (bug fix): This commit fixes an internal error with message like: "LeafTxn ... incompatible with locking request" that occurs when performing an update under read committed isolation which cascades to a table with multiple other foreign keys.
"Default" locking here simply means no locking. I find this convention of saying "non-default key locking" a little verbose and confusing. Epic: None Release note: None
741c335
to
d788ffa
Compare
119035: sql, sem, opt, explain, memo, kv: audit bit-flag-checking helpers r=DrewKimball,mgartner a=michae2 `@mgartner's` [comment](#118931 (review)) on #118931 inspired me to audit all the other helper functions that check whether a flag is set in a bitfield. I might have found a couple bugs. See individual commits for details. Epic: None Release note: None 133270: raft: deflake non-determinisctic raft node tests r=iskettaneh a=iskettaneh We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: #132992, #131676, #132205, #133048 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com> Co-authored-by: Ibrahim Kettaneh <ibrahim.kettaneh@cockroachlabs.com>
Backport 3/3 commits from #118722.
/cc @cockroachdb/release
sql, execbuilder: move plan flags from planTop to planComponents
Execbuild produces a
sql.planComponents
containing the main queryplan, plus plans for any subqueries, checks, and cascades. In the normal
SQL code path (
sql.(*optPlanningCtx).runExecBuilder
) we copy thisplanComponents into a
sql.planTop
, which then is decorated with extrainformation gathered from the Builder.
However, there are other users of execbuild that only work with the
planComponents, and don't have a planTop. In the next commit, one such
user,
PlanAndRunCascadesAndChecks
, needs to see some of the plan flagsset when building FK check plans, but doesn't have access to the Builder
which gathered those plans.
To solve this, this commit moves plan flags from the planTop to
planComponents, and then refactors the execbuilder and exec factory to
set some of these flags when creating the planComponents.
Informs: #118720
Epic: None
Release note: None
sql: don't parallelize any FK and unique checks containing locking
In #111896 we disallowed parallel execution of FK and unique checks that
contain locking, to avoid usage of LeafTxns. But #111896 only considered
checks built during planning of the main query, and overlooked checks
built during planning of FK cascades.
This commit also considers checks built during planning of FK cascades.
Fixes: #118720
Epic: None
Release note (bug fix): This commit fixes an internal error with message
like: "LeafTxn ... incompatible with locking request" that occurs when
performing an update under read committed isolation which cascades to a
table with multiple other foreign keys.
sql: rename *ContainsNonDefaultKeyLocking to *ContainsLocking
"Default" locking here simply means no locking. I find this convention
of saying "non-default key locking" a little verbose and confusing.
Epic: None
Release note: None
Release justification: fixes a bug that prevents some update statements from being executed under read committed isolation.