Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use user transaction if we have one to prepare queries #46588

Merged

Conversation

ajwerner
Copy link
Contributor

Prepartion of certain queries requires performing reads against the database.
If the user has already laid down intents, these reads may become part of a
dependency cycle. Prior to this commit, these reads would be on a different
transaction and thus this cycle would not be detected by our deadlock detection
mechanism.

This change opts to use the user's transaction for planning if there is one
and thus will properly interact with deadlock detection.

Fixes #46447.

Release justification: fixes for high-priority or high-severity bugs in
existing functionality

Release note: None

@ajwerner ajwerner requested a review from andreimatei March 25, 2020 21:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/use-user-txn-for-prepare branch 2 times, most recently from 5054d19 to 8f86aab Compare March 26, 2020 00:41
@ajwerner
Copy link
Contributor Author

hold off on looking at this, the testing is busted.

@ajwerner ajwerner force-pushed the ajwerner/use-user-txn-for-prepare branch from 8f86aab to e718c4a Compare March 26, 2020 01:14
@ajwerner
Copy link
Contributor Author

Alright, fixed the tests, RFAL.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajwerner Is this something that can be and should be backported to 19.2? The schemachange workload can reproduce wedging on both 19.1 and 19.2. I'm presuming it is this same deadlock.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

@ajwerner
Copy link
Contributor Author

Yes this and the two other high priority txn usage fixes should all be backported to 19.2 and probably 19.1. Will do that after this goes in.

There’s still one more PR inbound to deal with a different sort of deadlock uncovered by the test.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/sql/conn_executor_prepare.go, line 184 at r1 (raw file):

	flags, err := ex.populatePrepared(ctx, txn, placeholderHints, p)
	if err != nil {
		// NB: if this is not a new transaction then we should let the connExecutor

what do you mean "we should"? Do we or don't we?
I see that we do, except not very well. The code below doesn't handle retriable errors:

return eventNonRetriableErr{IsCommit: fsm.False}, eventNonRetriableErrPayload{err: err}


pkg/sql/conn_executor_test.go, line 599 at r1 (raw file):

	case <-time.After(time.Millisecond):
	case <-errCh:
		t.Fatal("expected the transaction to block")

put the error in the message


pkg/sql/conn_executor_test.go, line 654 at r1 (raw file):

	// This test will create an explicit transaction that encounters an error on
	// a latter statement during planning of SHOW COLUMNS. The planning for this
	// SHOW COLUMNS	will be run in the user's transaction. The test will inject

nit: spaces


pkg/sql/conn_executor_test.go, line 690 at r1 (raw file):

	_, err = tx.Prepare("SELECT NULL FROM [SHOW COLUMNS FROM bar] LIMIT 1")
	require.Regexp(t,
		"restart transaction: TransactionRetryWithProtoRefreshError: boom", err)

As I was saying above, I suspect this is currently broken since we're not actually returning the retriable code here. If you switch the code to pgx, you can assert on the error code.
In fact, I think that if things were to function correctly, this prepare should be auto-retried (endlessly, I guess, given your filter).

@ajwerner ajwerner force-pushed the ajwerner/use-user-txn-for-prepare branch 2 times, most recently from 5a1a409 to 72d8b5e Compare March 27, 2020 19:53
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreimatei thanks for the review, it's ready for another look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/sql/conn_executor_prepare.go, line 184 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what do you mean "we should"? Do we or don't we?
I see that we do, except not very well. The code below doesn't handle retriable errors:

return eventNonRetriableErr{IsCommit: fsm.False}, eventNonRetriableErrPayload{err: err}

yeah makes sense. Fixed.


pkg/sql/conn_executor_test.go, line 599 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

put the error in the message

Done.


pkg/sql/conn_executor_test.go, line 654 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: spaces

Done.


pkg/sql/conn_executor_test.go, line 690 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

As I was saying above, I suspect this is currently broken since we're not actually returning the retriable code here. If you switch the code to pgx, you can assert on the error code.
In fact, I think that if things were to function correctly, this prepare should be auto-retried (endlessly, I guess, given your filter).

We got back a response from inserting a row, I don't think it should auto-retry (and it doesn't).

Prepartion of certain queries requires performing reads against the database.
If the user has already laid down intents, these reads may become part of a
dependency cycle. Prior to this commit, these reads would be on a different
transaction and thus this cycle would not be detected by our deadlock detection
mechanism.

This change opts to use the user's transaction for planning if there is one
and thus will properly interact with deadlock detection.

Fixes cockroachdb#46447.

Release justification: fixes for high-priority or high-severity bugs in
existing functionality

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/use-user-txn-for-prepare branch from 72d8b5e to 0b945bf Compare March 27, 2020 20:01
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

@ajwerner
Copy link
Contributor Author

TFTR!

bors r=andreimatei

@craig
Copy link
Contributor

craig bot commented Mar 31, 2020

Build succeeded

@craig craig bot merged commit 1320e13 into cockroachdb:master Mar 31, 2020
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Apr 1, 2020
In cockroachdb#46588 a bug was introduced when a retriable error was encountered while
using a new transaction for preparing. Prior to that commit, all error were
treated as not retriable. This was sort of a bummer. Retriable errors can
occur due to read within uncertainty. Before this PR, those retriable errors
would make their way to the client. Now we'll handle those retry errors
internally underneath `connExecutor.prepare`

Fixes cockroachdb#43251

Release note: None
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Apr 1, 2020
In cockroachdb#46588 a bug was introduced when a retriable error was encountered while
using a new transaction for preparing. Prior to that commit, all error were
treated as not retriable. This was sort of a bummer. Retriable errors can
occur due to read within uncertainty. Before this PR, those retriable errors
would make their way to the client. Now we'll handle those retry errors
internally underneath `connExecutor.prepare`

Fixes cockroachdb#43251

Release note: None
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Apr 1, 2020
In cockroachdb#46588 a bug was introduced when a retriable error was encountered while
using a new transaction for preparing. Prior to that commit, all error were
treated as not retriable. This was sort of a bummer. Retriable errors can
occur due to read within uncertainty. Before this PR, those retriable errors
would make their way to the client. Now we'll handle those retry errors
internally underneath `connExecutor.prepare`

Fixes cockroachdb#43251

Release note: None
craig bot pushed a commit that referenced this pull request Apr 1, 2020
46829: sql: deal with retriable errors when using a new txn r=ajwerner a=ajwerner

In #46588 a bug was introduced when a retriable error was encountered while
using a new transaction for preparing. Prior to that commit, all error were
treated as not retriable. This was sort of a bummer. Retriable errors can
occur due to read within uncertainty. Before this PR, those retriable errors
would make their way to the client. Now we'll handle those retry errors
internally underneath `connExecutor.prepare`

Fixes #43251

Release note: None

46832: sqlmigrations: prevent schema change noise upon cluster creation r=lucy-zhang a=knz

First commit from  #46829.
Informs #46757.

The system.comments is now created without write permission to
public. No need to re-do that change on every new cluster.

Release note: None

46854: ui: (fix) Tooltip component styling and props r=dhartunian a=koorosh

This fix is rework of previous changes related to PR: #46557

Prior, to customize tooltip width for two particular cases,
changes were made in shared component and affected all tooltips
across project (tooltip width was set to 500px). 
For example, tooltips for Diagnostics badges were 500px wide:

<img width="1213" alt="Screenshot 2020-04-01 at 17 58 55" src="https://user-images.githubusercontent.com/3106437/78152553-a3068b00-7442-11ea-9575-f582cacc5ca4.png">

Instead of this, current changes keep component styles without
local changes and extend them with specific classes (via `overlayClassName`
prop). It allows to apply changes in specific places (Statements and Jobs
tables).

<img width="706" alt="Screenshot 2020-04-01 at 17 52 20" src="https://user-images.githubusercontent.com/3106437/78151930-cda41400-7441-11ea-8684-1eacd68f6934.png">

Next fix: the order of destructing props in `components/tooltip/tooltip.tsx`.
`{...props}` supplied as a last prop to `<AntTooltip />` component and it
overrides all previous props which have to be preserved. To fix this, ...props
was moved as first prop.

And last fix: Tooltips for Diagnostics Status Badge was set to be visible always
and with some random conditions tooltips appeared and were displayed instantly.
To fix this, `visible` prop was removed to trigger tooltip visibility only on
mouse hover.
And to position Diagnostics Status Badge tooltip more elegantly - it is positioned
to `bottomLeft` side, because this badge is displayed in the last columns and there
is not enough place on the right side for tooltip.

<img width="1198" alt="Screenshot 2020-04-01 at 17 51 35" src="https://user-images.githubusercontent.com/3106437/78151950-d4cb2200-7441-11ea-9b6d-e04a7f0d246f.png">

Release note (bug fix): Tooltips for statement diagnostics were shown always
instead of only on hover

Release justification: bug fixes and low-risk updates to new functionality

46871: vendor: Bump pebble to 74d69792cb150369fb7a799be862524ad2b39d51 r=itsbilal a=itsbilal

Pulls in these changes:
 - *: use errors.Errorf in replace of fmt.Errorf
 - *: use github.com/cockroachdb/errors
 - vendor: add cockroachdb/errors and its dependencies
 - *: add FileNum type
 - db: delete orphaned temporary files in Open
 - version_set: Handle case where RocksDB doesn't bump minUnflushedLogNum
 - db: modify Options.DebugCheck to be a function
 - internal/manifest: Relax SeqNum overlap invariant in L0
 - internal/metamorphic: fix ingest_using_apply implementation
 - *: use Go 1.13 error wrapping
 - internal/metamorphic: randomize MaxConcurrentCompactions
 - internal/metamorphic: enable run comparison by default
 - internal/metamorphic: temporarily disable ingest_using_apply
 - db: document that Close may not be called twice
 - db: use require.* functions for test checks
 - *: use require.NoError everywhere
 - internal/errorfs: move to new package

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Apr 2, 2020
In cockroachdb#46588 a bug was introduced when a retriable error was encountered while
using a new transaction for preparing. Prior to that commit, all error were
treated as not retriable. This was sort of a bummer. Retriable errors can
occur due to read within uncertainty. Before this PR, those retriable errors
would make their way to the client. Now we'll handle those retry errors
internally underneath `connExecutor.prepare`

Fixes cockroachdb#43251

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: deadlock due to planning query for SHOW TABLE
4 participants