sql: fix internal executor when it encounters a retry error #101477

yuzefovich · 2023-04-13T17:46:35Z

This PR contains several commits that fix a long-standing bug in the internal executor that could make it double count some things (rows or metadata) if an internal retry error occurs. In particular, since at least 21.1 (when we introduced the "streaming" internal executor in #59330) if the IE encounters a retry error after it has communicated some results to the client, it would proceed to retry the corresponding command as if the incomplete execution and the retry error never happened. In other words, it was possible for some rows to be double "counted" (either to be directly included multiple times into the result set or indirectly into the "rows affected" number). This PR fixes the problem by returning the retry error to the client in cases when some actual rows have already been communicated and by resetting the number of "rows affected" when "rewinding the stmt buffer" in order to retry the error transparently.

Fixes: #98558.

cockroach-teamcity · 2023-04-13T17:46:47Z

This change is

yuzefovich · 2023-04-14T01:56:32Z

@rafiss @knz I think two of you have the most context around this code, so I'm asking for review from both of you; @cucaroach I think you might have looked into this stuff recently, so I'd welcome your review too. Extra scrutiny is much appreciated given the trickiness and ubiquitousness of the internal executor.

yuzefovich · 2023-04-14T01:57:16Z

Oh, I think @ZhouXing19 also has exposure - welcome your review too!

knz · 2023-04-14T07:12:36Z

Amazing work. I'll review this shortly.

knz

For the 2nd commit, could I interest you in a test (something like the one from the 3rd commit) which checks that the rowsaffected are at the value expected even when a retry has occurred?

I haven't fully reviewed the changes to sql/internal.go yet because I find them intimidating :)

Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 3 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)

-- commits line 60 at r3:
For columns I'm not sure -- a retry could potentially bring a txn over a schema change that alters the number of columns?

-- commits line 76 at r4:
What happens if Exec shares a kv.Txn with the caller? In that case, if there's a retry error, it's not correct to just retry inside internalexecutor; because the caller may have performed some other SQL using the same kv.Txn prior to calling Exec, and all of that needs to be retried too.

I think we are only allowed to retry things inside internalexecutor if the caller specified a nil kv.Txn to start with. Where is this checked?

yuzefovich

That's a fair point. I can't say that I fully grokked how all this machinery works when I implemented the "streaming" internal executor a couple years ago. Now that I spent significant amount of time debugging this code, I have a decent mental model, so I just added another commit (now second) which attempts to outline how all moving pieces fit together, and then with each consequent commit I tried to update the corresponding comment accordingly.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)

-- commits line 60 at r3:

Previously, knz (Raphael 'kena' Poss) wrote…

Can we get some help from Faizan or someone else in the schema team to get a working test.

Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.

knz

This new doc is very good. Thank you.

Reviewed 1 of 1 files at r13, 1 of 10 files at r14, 2 of 3 files at r15, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)

-- commits line 60 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.

No it is not blocking this PR, but we still need this test to happen.

pkg/sql/internal.go line 818 at r13 (raw file):

// It's also worth noting that execInternal doesn't return until the
// connExecutor reaches the execution engine (i.e. until after the query
// planning has been performed). This is needed in order to avoid concurrent

I don't understand why this fully prevents concurrent access to the kv.Txn. If more rows are returned to the caller via the iterator, aren't those rows the product of a currently-used kv.Txn? What if there's another call to IE "under" the iterator?

pkg/sql/internal.go line 820 at r13 (raw file):

// planning has been performed). This is needed in order to avoid concurrent
// access to the txn.
// TODO(yuzefovich): currently, this statement is not entirely true if the retry

I'm curious - what is not true about it?

pkg/sql/internal.go line 833 at r13 (raw file):

// internalClientComm implements the ClientLock interface in such a fashion as
// if any command can be transparently retried.
// TODO(yuzefovich): fix this.

Maybe worth filing issues and linking them from the comment.

Release note: None

This commit changes the `RestrictedCommandResult` interface a bit to make `SetRowsAffected` just set the number of rows affected, rather than increment it. This method is supposed to be called only once, so this change seems reasonable on its own, but it also has an additional benefit for the follow-up commit. In particular, in the follow-up commit we will preserve the ability to automatically retry statements of non-ROWS stmt type by the internal executor. Note that this commit on its own fixes the behavior for the "rows affected" statements (the only known occurrence of the bug that is generally fixed in the following commit). Release note: None

This commit fixes a bug with the internal executor when it encounters an internal retry. Previously, the implementation of the "rewind capability" was such that we always assumed that we can rewind the StmtBuf for the IE at any point; however, that is not generally true. In particular, if we communicated anything to the client of the IE's connExecutor (`rowsIterator` in this context), then we cannot rewind the current command we're evaluating. In theory, we could have pushed some rows through the iterator only to encounter the internal retry later which would then lead to re-executing the command from the start (in other words, the rows that we've pushed before the retry would be double-pushed); in practice, however, we haven't actually seen this (at least yet). What we have seen is the number of rows affected being double counted. This particular reproduction was already fixed in the previous commit, but this commit fixes the problem more generally. This commit makes it so that for every `streamingCommandResult` we are tracking whether it has already communicated something to the client so that the result can no longer be rewound, and then we use that tracking mechanism to correctly implement the rewind capability. We have three possible types of data that we communicate: - rows - number of rows affected - column schema. If the retry happens after some rows have been communicated, we're out of luck - there is no way we can retry the stmt internally, so from now on we will return a retry error to the client. If the retry happens after "rows affected", then given the adjustment in the previous commit we can proceed transparently. In order to avoid propagating the retry error up when it occurs after having received the column schema but before pushing out any rows, this commit adjusts the behavior to always keep the latest column schema, thus, we can still proceed transparently in this case. This bug has been present since at least 21.1 when `streamingCommandResult` was introduced. However, since we now might return a retry error in some cases, this could lead to test failures or flakes, or even to errors in some internal CRDB operations that execute statements of ROWS type (if there is no appropriate retry logic), so I intend to only backport this to 23.1. There is also no release note since the only failure we've seen is about double counted "rows affected" number, the likelihood of which has significantly increased due to the jobs system refactor (i.e. mostly 23.1 is affected AFAIK). Additionally, this commit makes it so that we correctly block the `execInternal` call until the first actual, non-metadata result is seen (this behavior is needed to correctly synchronize access to the txn before the stmt is given to the execution engine). Release note: None

This commit teaches the internal executor to avoid propagating the retry errors to the client when executing `Exec{Ex}` methods. In those methods, we only return the number of rows affected, thus, in order to preserve the ability to transparently retry all statements (even if they are of ROWS statement type), we simply need to be able to reset the rows affected number on the `rowsIterator` if a result is being discarded (which indicates that a retry error has been encountered and the rewind mechanism is at play). This structure relies on the assumption that we push at most one command into the StmtBuf that results in "rows affected" which is true at the moment of writing, and I don't foresee that changing, at least for now. (This assumption would be incorrect only if we were to allow executing multiple statements as part of a single method invocation on the internal executor, but I don't think that's going to happen - we only might do so in a few tests.) Release note: None

yuzefovich

Thanks for the reviews everyone!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)

-- commits line 60 at r3:

Previously, knz (Raphael 'kena' Poss) wrote…

No it is not blocking this PR, but we still need this test to happen.

Left a TODO for now.

pkg/sql/internal.go line 818 at r13 (raw file):
It prevents the concurrent access between two goroutines at play - the rowsIterator and the "internal" connExecutor goroutines. If the "synchronous" ieResultChannel is used, only one goroutine will be active at any point in time. When more rows are returned to the caller, the caller was blocked on ieResultChannel while the rows were being produced.

What if there's another call to IE "under" the iterator?

This question is spot on - we actually put this synchronization in-place due to a bug with "nested" IE #62415. I expanded the comment.

pkg/sql/internal.go line 820 at r13 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'm curious - what is not true about it?

The problem was that if the retry occurs after the column schema was sent but before any rows were communicated, this call wouldn't block until the control flow is in the execution engine (when reevaluating the stmt after the retry). In other words, if we had a sequence like

send columns
encounter a retry
send columns (1)
go into execution engine (2)

the blocking was done only up until (1), but it should be up until (2). This is fixed in the fourth commit by looping until we see non-column data.

pkg/sql/internal.go line 833 at r13 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Maybe worth filing issues and linking them from the comment.

This is fixed in the fourth commit to - this TODO is the general bug this PR is fixing.

craig · 2023-04-19T17:02:36Z

Build failed:

Bazel Essential CI (Cockroach)

yuzefovich · 2023-04-19T17:04:33Z

Known flake #101519.

bors r+

craig · 2023-04-19T18:04:31Z

Build succeeded:

Bazel Essential CI (Cockroach)

This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. It is stored in the session data, but it actually has no meaning in the user's session, so this commit also adds a corresponding cluster setting to be able to change the default used in the internal executor (just in case we need it). Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None

This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. It is stored in the session data, but it actually has no meaning in the user's session, so this commit also adds a corresponding cluster setting to be able to change the default used in the internal executor (just in case we need it). Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None

This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None

114398: sql: introduce limit on number of retries in IntExec.Exec{Ex} methods r=yuzefovich a=yuzefovich This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in #101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Fixes: #109197. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in #101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None

yuzefovich added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.1.0 labels Apr 13, 2023

yuzefovich force-pushed the fix-rewind branch 7 times, most recently from 8f90a3d to f183f6d Compare April 14, 2023 01:49

yuzefovich changed the title ~~sql: fix internal executor when it retries~~ sql: fix internal executor when it encounters a retry error Apr 14, 2023

yuzefovich marked this pull request as ready for review April 14, 2023 01:52

yuzefovich requested a review from a team as a code owner April 14, 2023 01:52

yuzefovich requested a review from a team April 14, 2023 01:52

yuzefovich requested review from a team as code owners April 14, 2023 01:52

yuzefovich requested review from miretskiy, michae2, rafiss, cucaroach and knz and removed request for a team, miretskiy and michae2 April 14, 2023 01:52

yuzefovich requested a review from ZhouXing19 April 14, 2023 01:57

yuzefovich mentioned this pull request Apr 14, 2023

jobs: TestJobControlByType failed [InternalExecutor.ExecEx reports twice as many rows affected under stress, investigation ongoing] #98558

Closed

knz reviewed Apr 14, 2023

View reviewed changes

yuzefovich force-pushed the fix-rewind branch 2 times, most recently from f67fec5 to b6b091c Compare April 19, 2023 00:18

yuzefovich commented Apr 19, 2023

View reviewed changes

knz approved these changes Apr 19, 2023

View reviewed changes

yuzefovich added 4 commits April 19, 2023 09:24

sql: document how the machinery of the internal executor fits together

0de74be

Release note: None

yuzefovich force-pushed the fix-rewind branch from b6b091c to c09860b Compare April 19, 2023 16:30

yuzefovich commented Apr 19, 2023

View reviewed changes

craig bot merged commit 3042c4f into cockroachdb:master Apr 19, 2023

This was referenced Apr 19, 2023

release-23.1: sql: fix internal executor when it encounters a retry error #101857

Merged

release-23.1.0: sql: fix internal executor when it encounters a retry error #101858

Merged

yuzefovich deleted the fix-rewind branch April 19, 2023 18:06

yuzefovich mentioned this pull request Apr 23, 2023

jobs: TestJobControlByType failed #101427

Closed

yuzefovich mentioned this pull request May 17, 2023

release-22.2: sql: fix RowsAffected in internal executor when it encounters a retry error #103515

Merged

yuzefovich mentioned this pull request Jun 27, 2023

sql: re-execute distributed query as local for some errors #105451

Merged

yuzefovich mentioned this pull request Nov 14, 2023

sql: stack overflow in InternalExecutor.Exec{Ex} #109197

Closed

yuzefovich mentioned this pull request Nov 14, 2023

sql: introduce limit on number of retries in IntExec.Exec{Ex} methods #114398

Merged

blathers-crl bot mentioned this pull request Nov 16, 2023

release-23.2: sql: introduce limit on number of retries in IntExec.Exec{Ex} methods #114541

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix internal executor when it encounters a retry error #101477

sql: fix internal executor when it encounters a retry error #101477

yuzefovich commented Apr 13, 2023 •

edited

Loading

cockroach-teamcity commented Apr 13, 2023

yuzefovich commented Apr 14, 2023

yuzefovich commented Apr 14, 2023

knz commented Apr 14, 2023

knz left a comment

yuzefovich left a comment

knz left a comment

yuzefovich left a comment

craig bot commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

craig bot commented Apr 19, 2023

sql: fix internal executor when it encounters a retry error #101477

sql: fix internal executor when it encounters a retry error #101477

Conversation

yuzefovich commented Apr 13, 2023 • edited Loading

cockroach-teamcity commented Apr 13, 2023

yuzefovich commented Apr 14, 2023

yuzefovich commented Apr 14, 2023

knz commented Apr 14, 2023

knz left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

craig bot commented Apr 19, 2023

yuzefovich commented Apr 13, 2023 •

edited

Loading