-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix internal executor when it encounters a retry error #101477
Conversation
8f90a3d
to
f183f6d
Compare
@rafiss @knz I think two of you have the most context around this code, so I'm asking for review from both of you; @cucaroach I think you might have looked into this stuff recently, so I'd welcome your review too. Extra scrutiny is much appreciated given the trickiness and ubiquitousness of the internal executor. |
Oh, I think @ZhouXing19 also has exposure - welcome your review too! |
Amazing work. I'll review this shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 2nd commit, could I interest you in a test (something like the one from the 3rd commit) which checks that the rowsaffected are at the value expected even when a retry has occurred?
I haven't fully reviewed the changes to sql/internal.go yet because I find them intimidating :)
Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 3 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)
-- commits
line 60 at r3:
For columns I'm not sure -- a retry could potentially bring a txn over a schema change that alters the number of columns?
-- commits
line 76 at r4:
What happens if Exec shares a kv.Txn with the caller? In that case, if there's a retry error, it's not correct to just retry inside internalexecutor; because the caller may have performed some other SQL using the same kv.Txn prior to calling Exec, and all of that needs to be retried too.
I think we are only allowed to retry things inside internalexecutor if the caller specified a nil kv.Txn to start with. Where is this checked?
f67fec5
to
b6b091c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I can't say that I fully grokked how all this machinery works when I implemented the "streaming" internal executor a couple years ago. Now that I spent significant amount of time debugging this code, I have a decent mental model, so I just added another commit (now second) which attempts to outline how all moving pieces fit together, and then with each consequent commit I tried to update the corresponding comment accordingly.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)
Previously, knz (Raphael 'kena' Poss) wrote…
Can we get some help from Faizan or someone else in the schema team to get a working test.
Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new doc is very good. Thank you.
Reviewed 1 of 1 files at r13, 1 of 10 files at r14, 2 of 3 files at r15, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @rafiss, @yuzefovich, and @ZhouXing19)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Do you think this is a blocking question? I'm not saying we shouldn't introduce a test like this, but it seems like an edge case that we can leave a TODO for.
No it is not blocking this PR, but we still need this test to happen.
pkg/sql/internal.go
line 818 at r13 (raw file):
// It's also worth noting that execInternal doesn't return until the // connExecutor reaches the execution engine (i.e. until after the query // planning has been performed). This is needed in order to avoid concurrent
I don't understand why this fully prevents concurrent access to the kv.Txn. If more rows are returned to the caller via the iterator, aren't those rows the product of a currently-used kv.Txn? What if there's another call to IE "under" the iterator?
pkg/sql/internal.go
line 820 at r13 (raw file):
// planning has been performed). This is needed in order to avoid concurrent // access to the txn. // TODO(yuzefovich): currently, this statement is not entirely true if the retry
I'm curious - what is not true about it?
pkg/sql/internal.go
line 833 at r13 (raw file):
// internalClientComm implements the ClientLock interface in such a fashion as // if any command can be transparently retried. // TODO(yuzefovich): fix this.
Maybe worth filing issues and linking them from the comment.
Release note: None
This commit changes the `RestrictedCommandResult` interface a bit to make `SetRowsAffected` just set the number of rows affected, rather than increment it. This method is supposed to be called only once, so this change seems reasonable on its own, but it also has an additional benefit for the follow-up commit. In particular, in the follow-up commit we will preserve the ability to automatically retry statements of non-ROWS stmt type by the internal executor. Note that this commit on its own fixes the behavior for the "rows affected" statements (the only known occurrence of the bug that is generally fixed in the following commit). Release note: None
This commit fixes a bug with the internal executor when it encounters an internal retry. Previously, the implementation of the "rewind capability" was such that we always assumed that we can rewind the StmtBuf for the IE at any point; however, that is not generally true. In particular, if we communicated anything to the client of the IE's connExecutor (`rowsIterator` in this context), then we cannot rewind the current command we're evaluating. In theory, we could have pushed some rows through the iterator only to encounter the internal retry later which would then lead to re-executing the command from the start (in other words, the rows that we've pushed before the retry would be double-pushed); in practice, however, we haven't actually seen this (at least yet). What we have seen is the number of rows affected being double counted. This particular reproduction was already fixed in the previous commit, but this commit fixes the problem more generally. This commit makes it so that for every `streamingCommandResult` we are tracking whether it has already communicated something to the client so that the result can no longer be rewound, and then we use that tracking mechanism to correctly implement the rewind capability. We have three possible types of data that we communicate: - rows - number of rows affected - column schema. If the retry happens after some rows have been communicated, we're out of luck - there is no way we can retry the stmt internally, so from now on we will return a retry error to the client. If the retry happens after "rows affected", then given the adjustment in the previous commit we can proceed transparently. In order to avoid propagating the retry error up when it occurs after having received the column schema but before pushing out any rows, this commit adjusts the behavior to always keep the latest column schema, thus, we can still proceed transparently in this case. This bug has been present since at least 21.1 when `streamingCommandResult` was introduced. However, since we now might return a retry error in some cases, this could lead to test failures or flakes, or even to errors in some internal CRDB operations that execute statements of ROWS type (if there is no appropriate retry logic), so I intend to only backport this to 23.1. There is also no release note since the only failure we've seen is about double counted "rows affected" number, the likelihood of which has significantly increased due to the jobs system refactor (i.e. mostly 23.1 is affected AFAIK). Additionally, this commit makes it so that we correctly block the `execInternal` call until the first actual, non-metadata result is seen (this behavior is needed to correctly synchronize access to the txn before the stmt is given to the execution engine). Release note: None
This commit teaches the internal executor to avoid propagating the retry errors to the client when executing `Exec{Ex}` methods. In those methods, we only return the number of rows affected, thus, in order to preserve the ability to transparently retry all statements (even if they are of ROWS statement type), we simply need to be able to reset the rows affected number on the `rowsIterator` if a result is being discarded (which indicates that a retry error has been encountered and the rewind mechanism is at play). This structure relies on the assumption that we push at most one command into the StmtBuf that results in "rows affected" which is true at the moment of writing, and I don't foresee that changing, at least for now. (This assumption would be incorrect only if we were to allow executing multiple statements as part of a single method invocation on the internal executor, but I don't think that's going to happen - we only might do so in a few tests.) Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews everyone!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @knz, @rafiss, and @ZhouXing19)
Previously, knz (Raphael 'kena' Poss) wrote…
No it is not blocking this PR, but we still need this test to happen.
Left a TODO for now.
pkg/sql/internal.go
line 818 at r13 (raw file):
It prevents the concurrent access between two goroutines at play - the rowsIterator
and the "internal" connExecutor
goroutines. If the "synchronous" ieResultChannel
is used, only one goroutine will be active at any point in time. When more rows are returned to the caller, the caller was blocked on ieResultChannel
while the rows were being produced.
What if there's another call to IE "under" the iterator?
This question is spot on - we actually put this synchronization in-place due to a bug with "nested" IE #62415. I expanded the comment.
pkg/sql/internal.go
line 820 at r13 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I'm curious - what is not true about it?
The problem was that if the retry occurs after the column schema was sent but before any rows were communicated, this call wouldn't block until the control flow is in the execution engine (when reevaluating the stmt after the retry). In other words, if we had a sequence like
- send columns
- encounter a retry
- send columns (1)
- go into execution engine (2)
the blocking was done only up until (1), but it should be up until (2). This is fixed in the fourth commit by looping until we see non-column data.
pkg/sql/internal.go
line 833 at r13 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Maybe worth filing issues and linking them from the comment.
This is fixed in the fourth commit to - this TODO is the general bug this PR is fixing.
Build failed: |
Known flake #101519. bors r+ |
Build succeeded: |
This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. It is stored in the session data, but it actually has no meaning in the user's session, so this commit also adds a corresponding cluster setting to be able to change the default used in the internal executor (just in case we need it). Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None
This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. It is stored in the session data, but it actually has no meaning in the user's session, so this commit also adds a corresponding cluster setting to be able to change the default used in the internal executor (just in case we need it). Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None
This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in cockroachdb#101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None
114398: sql: introduce limit on number of retries in IntExec.Exec{Ex} methods r=yuzefovich a=yuzefovich This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in #101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Fixes: #109197. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This commit introduces a limit on the number of times the InternalExecutor machinery can retry errors in `Exec{Ex}` methods. That logic was introduced in c09860b in order to reduce the impact of fixes in other commits in #101477. However, there is no limit in the number of retries, and we hit the stack overflow twice in our own testing recently seemingly in this code path. Thus, this commit adds a limit, 5 by default. Note that I'm not sure exactly what bug, if any, can lead to the stack overflow. One seemingly-unlikely theory is that there is no bug, meaning that we were simply retrying forever because the stmts were pushed by higher priority txns every time. Still, this change seems beneficial on its own and should prevent stack overflows even if we don't fully understand the root cause. An additional improvement is that we now track the depth of recursion, and once it exceeds 1000, we'll return an error. This should prevent the stack overflows due to other reasons. There is no release note given we've only seen this twice in our own testing and it involved cluster-to-cluster streaming. Release note: None
This PR contains several commits that fix a long-standing bug in the internal executor that could make it double count some things (rows or metadata) if an internal retry error occurs. In particular, since at least 21.1 (when we introduced the "streaming" internal executor in #59330) if the IE encounters a retry error after it has communicated some results to the client, it would proceed to retry the corresponding command as if the incomplete execution and the retry error never happened. In other words, it was possible for some rows to be double "counted" (either to be directly included multiple times into the result set or indirectly into the "rows affected" number). This PR fixes the problem by returning the retry error to the client in cases when some actual rows have already been communicated and by resetting the number of "rows affected" when "rewinding the stmt buffer" in order to retry the error transparently.
Fixes: #98558.