Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement XA transaction SQL statements #129448

Merged
merged 4 commits into from
Dec 18, 2024

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Aug 21, 2024

Informs #22329.
Closes #137325.

This PR implements the statement handling for PREPARE TRANSACTION <gid>, COMMIT PREPARED <gid>, and ROLLBACK PREPARED <gid>, for parity with Postgres support. The implementation is based on the Postgres documentation and the Postgres source code.

The PR then implements the pg_catalog.pg_prepared_xacts system catalog table, which reads from system.prepared_transactions. This adds the remaining PG compatibility, so that XA in CockroachDB looks near-identical to XA in PostgreSQL.

Finally, the PR fixes the pgjdbc blocklist to reflect the current state of support for XA transactions in CockroachDB. With the newly added support for two-phase commit transactions, all tests in the XADataSourceTest test suite now pass except for mappingOfConstraintViolations, which is blocked by #31632 (DEFERRABLE foreign keys).

Release note (sql change): XA transaction support allows CockroachDB to participate in distributed transaction with other resources (e.g. databases, message queues, etc) using a two phase commit protocol.

Copy link

blathers-crl bot commented Aug 21, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! i reviewed the final 3 commits of this PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)


pkg/sql/two_phase_commit.go line 227 at r13 (raw file):
nit: in most of our user facing docs we use the term "admin role" instead of "superuser" so let's make this say something like

Must have the admin role or be the same user as the one that prepared the transaction.


pkg/sql/two_phase_commit.go line 49 at r13 (raw file):

	if prepareErr != nil {
		if rbErr := ex.state.mu.txn.Rollback(ctx); rbErr != nil {
			log.Warningf(ctx, "txn rollback failed: err=%s", rbErr)

just confirming; is it correct to not short-circuit here? the state machine will advance to the NoTxn state, but I'm not sure what that means if the transaction never rolled back.


pkg/sql/two_phase_commit.go line 66 at r13 (raw file):

	// TODO(nvanbenschoten): why are these needed here (and in the equivalent
	// functions for commit and rollback)? Shouldn't they be handled by
	// connExecutor.resetExtraTxnState?

i'm curious what happened when these closes weren't added here?


pkg/sql/two_phase_commit.go line 81 at r13 (raw file):

	// which would prevent it from being prepared.
	if ex.extraTxnState.descCollection.HasUncommittedDescriptors() {
		return errors.Errorf("cannot prepare transaction: uncommitted descriptors")

nit: could the error say something more user facing, like "cannot prepare a transaction that has already performed schema changes"

and should it have an error code like InvalidTransactionState (25000)?


pkg/sql/two_phase_commit.go line 86 at r13 (raw file):

	// TODO(nvanbenschoten): add a Txn.Key method to get the transaction's key,
	// instead of using TestingCloneTxn.
	txnProto := ex.state.mu.txn.TestingCloneTxn()

nit: just curious if there are any known gotchas with using this testing function here. if so, including them in the comment would be useful.


pkg/sql/two_phase_commit.go line 97 at r13 (raw file):

		ctx,
		ex.server.cfg.InternalDB.Executor(),
		nil, /* sqlTxn */

is it correct to use a separate transaction for this?

as long as it is, then the more conventional approach would be to pass in the InternalDB, and then use internalDB.DescsTxn(...) or internalDB.Txn(...) in the callee. these are useful since they take care of some setup/teardown of the transaction.


pkg/sql/two_phase_commit.go line 110 at r13 (raw file):

		// v24.3 is no longer necessary.
		if pgerror.GetPGCode(err) == pgcode.UndefinedTable {
			return pgerror.Newf(pgcode.FeatureNotSupported, "PREPARE TRANSACTION unsupported in mixed-version cluster")

nit: instead of having this check here, let's make the first step of execPrepareTransactionInOpenStateInternal be something like:

	ex.planner.EvalContext().Settings.Version.IsActive(...)

the planner should already be set by this point, since that happens in the beginning of execStmtInOpenState


pkg/sql/two_phase_commit.go line 170 at r13 (raw file):

	err = deletePreparedTransaction(
		ctx,
		ex.server.cfg.InternalDB.Executor(),

similar comment as above, let's pass in InternalDB here instead.


pkg/sql/two_phase_commit.go line 278 at r13 (raw file):

		// TODO(nvanbenschoten): Remove this logic when mixed-version support with
		// v24.3 is no longer necessary.
		if pgerror.GetPGCode(err) == pgcode.UndefinedTable {

similar comment as above; we can eagerly check for the mixed version state and short-circuit


pkg/internal/sqlsmith/sqlsmith.go line 371 at r13 (raw file):

		{2, makeCommit},
		{2, makeRollback},
		// TODO(nvanbenschoten): add two-phase commit statements.

do we already have tracking issues for the testing work that we are postponing for later?


pkg/sql/pg_catalog.go line 2426 at r14 (raw file):

	schema: vtable.PGCatalogPreparedXacts,
	populate: func(ctx context.Context, p *planner, dbContext catalog.DatabaseDescriptor, addRow func(...tree.Datum) error) error {
		rows, err := p.InternalSQLTxn().QueryBufferedEx(

this virtual table could be implemented as a view on system.prepared_transactions. those are generally easier to maintain, so we prefer that approach. take a look at pg_shdescription for example:

var pgCatalogSharedDescriptionView = virtualSchemaView{
schema: vtable.PGCatalogSharedDescription,
resultColumns: colinfo.ResultColumns{
{Name: "objoid", Typ: types.Oid},
{Name: "classoid", Typ: types.Oid},
{Name: "description", Typ: types.String},
},
comment: `shared object comments
https://www.postgresql.org/docs/9.5/catalog-pg-shdescription.html`,
}

to implement the mixed version check, the view can additionally execute the crdb_internal.is_at_least_version function.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I've rebased this on master now that the KV PR has landed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/internal/sqlsmith/sqlsmith.go line 371 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

do we already have tracking issues for the testing work that we are postponing for later?

I opened #137636 to track this additional testing work. We can add to it if we think of anything more.


pkg/sql/pg_catalog.go line 2426 at r14 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

this virtual table could be implemented as a view on system.prepared_transactions. those are generally easier to maintain, so we prefer that approach. take a look at pg_shdescription for example:

var pgCatalogSharedDescriptionView = virtualSchemaView{
schema: vtable.PGCatalogSharedDescription,
resultColumns: colinfo.ResultColumns{
{Name: "objoid", Typ: types.Oid},
{Name: "classoid", Typ: types.Oid},
{Name: "description", Typ: types.String},
},
comment: `shared object comments
https://www.postgresql.org/docs/9.5/catalog-pg-shdescription.html`,
}

to implement the mixed version check, the view can additionally execute the crdb_internal.is_at_least_version function.

This almost works, but I'm struggling to get the handling of the non-existent table in the mixed-version case working correctly? Any suggestions for how to express that in SQL?


pkg/sql/two_phase_commit.go line 49 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

just confirming; is it correct to not short-circuit here? the state machine will advance to the NoTxn state, but I'm not sure what that means if the transaction never rolled back.

I don't think there's anything better we can do here. If the rollback fails then I guess the kv transaction is going to need to time out. Failing to advance the sql state machine to NoTxn doesn't really accomplish anything.

This is the same kind of logic that we have in connExecutor.rollbackSQLTransaction.


pkg/sql/two_phase_commit.go line 66 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i'm curious what happened when these closes weren't added here?

Nothing bad happened, IIRC. I just noticed this when copying over some of this logic from commitSQLTransactionInternal. I kept the logic for symmetry, but I don't understand why it's needed in either place. Probably not necessary to bottom out on this here.


pkg/sql/two_phase_commit.go line 81 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: could the error say something more user facing, like "cannot prepare a transaction that has already performed schema changes"

and should it have an error code like InvalidTransactionState (25000)?

Done and done.


pkg/sql/two_phase_commit.go line 86 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: just curious if there are any known gotchas with using this testing function here. if so, including them in the comment would be useful.

No gotchas. I think it's just named like this because we at one point decided that exposing the raw proto from the transaction was a "bad API" which should only be used in tests.

I addressed the TODO in a new commit.


pkg/sql/two_phase_commit.go line 97 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is it correct to use a separate transaction for this?

as long as it is, then the more conventional approach would be to pass in the InternalDB, and then use internalDB.DescsTxn(...) or internalDB.Txn(...) in the callee. these are useful since they take care of some setup/teardown of the transaction.

These need to use a separate transaction, because we don't want the bookkeeping in the system.prepared_transactions table to be part of the prepared transaction.

The internalDB.DescsTxn(...) or internalDB.Txn(...) approaches seem like a lot more boilerplate for kicking off a single-statement, implicit transaction. Is there a benefit to using them over passing a nil transaction?

The other benefit of the approach here is that I can call it with a nil transaction (and unbound Executor) in cleanupAfterFailedPrepareTransaction and the SQL transaction (and bound Executor) in deletePreparedTxn.

Happy to go with whatever approach you think is best.


pkg/sql/two_phase_commit.go line 110 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: instead of having this check here, let's make the first step of execPrepareTransactionInOpenStateInternal be something like:

	ex.planner.EvalContext().Settings.Version.IsActive(...)

the planner should already be set by this point, since that happens in the beginning of execStmtInOpenState

Done.


pkg/sql/two_phase_commit.go line 227 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: in most of our user facing docs we use the term "admin role" instead of "superuser" so let's make this say something like

Must have the admin role or be the same user as the one that prepared the transaction.

This hint message comes straight from PG, so I figured I'd keep it identical. See https://github.com/postgres/postgres/blob/7a80e381d16c642d00ec6146ccdf1262a159c69e/src/backend/access/transam/twophase.c#L587.

Happy to change it if you feel strongly.


pkg/sql/two_phase_commit.go line 278 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

similar comment as above; we can eagerly check for the mixed version state and short-circuit

Done.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)


pkg/sql/pg_catalog.go line 2426 at r14 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This almost works, but I'm struggling to get the handling of the non-existent table in the mixed-version case working correctly? Any suggestions for how to express that in SQL?

ah good point. i don't know a way around resolving the table name.

but since we do have a planner here, let's have this populate function short-circuit with a p.IsActive check rather than using error codes to learn if it's a mixed version state.


pkg/sql/two_phase_commit.go line 49 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think there's anything better we can do here. If the rollback fails then I guess the kv transaction is going to need to time out. Failing to advance the sql state machine to NoTxn doesn't really accomplish anything.

This is the same kind of logic that we have in connExecutor.rollbackSQLTransaction.

i believe there is one state transition failure case where we simply end the session. i can't seem to find it now though, so i'm fine with leaving this as is.


pkg/sql/two_phase_commit.go line 66 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nothing bad happened, IIRC. I just noticed this when copying over some of this logic from commitSQLTransactionInternal. I kept the logic for symmetry, but I don't understand why it's needed in either place. Probably not necessary to bottom out on this here.

i see, thanks for explaining! i will take a cursory look into the history here separate from this PR.


pkg/sql/two_phase_commit.go line 97 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

These need to use a separate transaction, because we don't want the bookkeeping in the system.prepared_transactions table to be part of the prepared transaction.

The internalDB.DescsTxn(...) or internalDB.Txn(...) approaches seem like a lot more boilerplate for kicking off a single-statement, implicit transaction. Is there a benefit to using them over passing a nil transaction?

The other benefit of the approach here is that I can call it with a nil transaction (and unbound Executor) in cleanupAfterFailedPrepareTransaction and the SQL transaction (and bound Executor) in deletePreparedTxn.

Happy to go with whatever approach you think is best.

i do think we should stick with internalDB here. IMO it makes it more clear to readers that a separate transaction is going to be initiated.

it also takes care of releasing descriptor leases, which isn't super relevant right now, but it's very easy to forget about that if the code is extended more later, or if somebody looks at this as inspiration for a different function.

also IMO it's not a huge amount of boilerplate. it mostly comes down to a bit of added code indentation:

return p.execCfg.InternalDB.Txn(ctx, func(ctx context.Context, txn isql.Txn) error {
datums, err := txn.QueryRowEx(
ctx, "read-setting",
txn.KV(),
sessiondata.NodeUserSessionDataOverride,
"SELECT value FROM system.settings WHERE name = $1", key,
)


pkg/sql/two_phase_commit.go line 227 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This hint message comes straight from PG, so I figured I'd keep it identical. See https://github.com/postgres/postgres/blob/7a80e381d16c642d00ec6146ccdf1262a159c69e/src/backend/access/transam/twophase.c#L587.

Happy to change it if you feel strongly.

thanks, i'm fine with using the term superuser


pkg/sql/two_phase_commit.go line 96 at r19 (raw file):

	txnKey := txn.Key()
	if !txn.IsOpen() {
		return errors.AssertionFailedf("cannot prepare a transaction that is not open")

do we need one more check to make sure we're in an explicit transaction? if we're not in one, then it would be a user error.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/sql/pg_catalog.go line 2426 at r14 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

ah good point. i don't know a way around resolving the table name.

but since we do have a planner here, let's have this populate function short-circuit with a p.IsActive check rather than using error codes to learn if it's a mixed version state.

Done.


pkg/sql/two_phase_commit.go line 97 at r13 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i do think we should stick with internalDB here. IMO it makes it more clear to readers that a separate transaction is going to be initiated.

it also takes care of releasing descriptor leases, which isn't super relevant right now, but it's very easy to forget about that if the code is extended more later, or if somebody looks at this as inspiration for a different function.

also IMO it's not a huge amount of boilerplate. it mostly comes down to a bit of added code indentation:

return p.execCfg.InternalDB.Txn(ctx, func(ctx context.Context, txn isql.Txn) error {
datums, err := txn.QueryRowEx(
ctx, "read-setting",
txn.KV(),
sessiondata.NodeUserSessionDataOverride,
"SELECT value FROM system.settings WHERE name = $1", key,
)

Got it, I think I see how you wanted this to look. The nice part is that I now only need to pass one param (an isql.Txn) to these functions, instead of two (a isql.Executor and a *kv.Txn).

I put this cleanup in a separate commit to make it easier for you to review, to confirm that I'm understanding the suggestion correctly.

If you 👍, I'll squash the commit into the first one in this PR.


pkg/sql/two_phase_commit.go line 96 at r19 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

do we need one more check to make sure we're in an explicit transaction? if we're not in one, then it would be a user error.

That check is in connExecutor.execStmtInNoTxnState. There's a test for this in two_phase_commit, if you search for there is no transaction in progress.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)


pkg/sql/two_phase_commit.go line 97 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Got it, I think I see how you wanted this to look. The nice part is that I now only need to pass one param (an isql.Txn) to these functions, instead of two (a isql.Executor and a *kv.Txn).

I put this cleanup in a separate commit to make it easier for you to review, to confirm that I'm understanding the suggestion correctly.

If you 👍, I'll squash the commit into the first one in this PR.

this looks great, thanks.

Informs cockroachdb#22329.

This commit implements the statement handling for `PREPARE TRANSACTION
<gid>`, `COMMIT PREPARED <gid>`, and `ROLLBACK PREPARED <gid>`, for
parity with Postgres support. The implementation is based on the
Postgres documentation and the Postgres source code.

Release note (sql change): XA transaction support allows CockroachDB to
participate in distributed transaction with other resources (e.g.
databases, message queues, etc) using a two phase commit protocol.
Addresses a TODO.

Epic: None
Release note: None
Informs cockroachdb#22329.

`pg_catalog.pg_prepared_xacts` reads from `system.prepared_transactions`.

Release note: None
Informs cockroachdb#22329.

This commit fixes the pgjdbc blocklist to reflect the current state of
support for XA transactions in CockroachDB. With the newly added support
for two-phase commit transactions, all tests in the XADataSourceTest
test suite now pass except for `mappingOfConstraintViolations`, which is
blocked by cockroachdb#31632 (DEFERRABLE foreign keys).

Release note: None
@nvanbenschoten
Copy link
Member Author

TFTR!

bors r=rafiss

@craig craig bot merged commit 694ca19 into cockroachdb:master Dec 18, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/tests: TestRandomSyntaxGeneration failed [unexpected statement: *tree.RollbackPrepared]
3 participants