Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement feature parity with Postgres for XA transactions #22359

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

spencerkimball
Copy link
Member

New syntax for PREPARE TRANSACTION <gid>, COMMIT PREPARED <tid>,
and ROLLBACK PREPARED <tid>, for parity with Postgres support.

Added a new system.prepared_xacts virtual table as well as a
pg_catalog.pg_prepared_xacts, for compatibility.

When a transaction is prepared, we send an EndTransactionRequest
as usual, but a new flag indicates the transaction should be moved
to status PREPARED instad of COMMITTED. In state PREPARED, a
transaction record is updated to include all of its intents, and
awaits a subsequent EndTransactionRequest which will move the
transaction either to COMMITTED or ABORTED. Transaction in state
PREPARED are given an expiration of 1h by default. This setting
is controlled by storage.prepared_txn.timeout.

Release note (sql change): XA transaction support allows CockroachDB to
participate in distributed transaction with other resources (e.g.
databases, message queues, etc) using a two phase commit protocol.

@spencerkimball spencerkimball requested review from andreimatei and a team February 4, 2018 00:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@spencerkimball
Copy link
Member Author

This change still requires unittests; I've only tested it works manually so far.

I would like early feedback on SQL changes as this is the first time I've messed with the grammar or added a plan node. I mostly just cargo culted.

@andreimatei
Copy link
Contributor

Spencer I'm sure this is probably awesome, but unfortunately I won't be able to review in the next two weeks. I'm a bit swamped trying to finish other stuff. If you want it soon, might want to try someone else. Sorry.

@andreimatei
Copy link
Contributor

fly-by: might wanna mention #22329

Btw, are you sure we want to add this feature now? Was there anything other than that one forum post that made you do this? Seems like a pretty large change...


Review status: 0 of 48 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@spencerkimball
Copy link
Member Author

It’s not the first time the question has been asked around XA. If we want to integrate with any externally managed transactions, this is a requirement. In practice this usually means message queues. If you really look at this change, it’s tiny. The bulk is just adding the syntax and a new system table. The pull request here is 10x the actual meaningful changed lines, zero exaggeration. I will add someone else for reviewer.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One pretty please and one question: please don't make changes to executor.go, I'm trying to get rid of it. At least, make the corresponding changes to conn_fsm.go, which contains the rewritten state machine for a connection.

And the question: does this patch add any protection for prepared txns so their intents can't be pushed? I don't see it, but I guess it should?

@andreimatei
Copy link
Contributor

If you do modify conn_fsm.go, please do it on top of #22367. That adds something that'll probably affect the changes you'd be making.

Another random thought - I guess we need to fail the prepare if the txn has queued any schema changes.

@spencerkimball
Copy link
Member Author

@andreimatei it's not clear to me how to add the changes to conn_fsm.go. Why would I add anything there when all of the code I'm cargo-culting from is in executor.go? :trollface:

Also, yes, this provides protection against the intents being pushed. There's a 1h default timeout for prepared transaction records. After that they're fair game.

I'm going to have to check in with you when I get a free moment this week to see how to accommodate your suggestion.

Added unittests and a mechanism to update the prepared_xacts table upon what's called a "heuristic commit/abort". According to my reading of the X/Open specification, this applies any time the resource manager futzes with the prepared transaction before it can be committed or rolled back by the transaction manager.

@bdarnell
Copy link
Contributor

bdarnell commented Feb 5, 2018

This doesn't just need unittests - we need to test that it can work in an actual distributed transaction with another system (Ideally in an automated way, but manually might be OK as a stopgap as long as we document how to repeat it).

This opens up a lot of compatibility surface area, which our experience with ORM compatibility suggests will take a lot of tweaking. The x/open spec has a lot of rules around when the RM is allowed to forget about a transaction (even after it's been aborted), and about the error codes that are used (the "heuristic commit/abort" cases are supposed to be distinguishable from other outcomes).

The new system table containing prepared transactions could grow quite large; I think this may have the potential to be the largest thing we load up in a virtual table, which could lead to memory problems.

I'm -1 on implementing this in the near future. Demand for this feature is low (AFAICT), and the testing/support burden is high.

@spencerkimball
Copy link
Member Author

I spent some time looking through the postgres code base today to understand what kind of errors they're returning on various error conditions, like heuristic abort/commits. There are no specific errors being returned. Basically the conclusion I've reached is that the X/Open specification is mostly ignored by Postgres' implementation and that's not as big a deal as it might sound like because in practice, transaction managers merely rely on success or failure responses to the first and second commit phases. A failure on the first phase means the TM won't move to the second phase. A failure on the second phase is assumed to mean the resource manager independently finalized its local txn state and the TM will return a significant failure.

Postgres, for example, does not handle requests by the TM to forget prepared transactions. Even if we were to follow the X/Open specification exactly, it wouldn't do any good because the postgres XA drivers that I've looked at don't use any of the extra stuff.

My thought was that the prepared transactions table would in fact not get large – at least no larger than an order of magnitude more than the peak number of concurrent distributed transactions in the system. Having these things hang around is bad news and limiting their lifetimes is the main concern of transaction managers. If the TM is in recovery, it also isn't preparing new transactions, so they don't grow without bound just because a TM is offline. Maybe that's hopelessly naive.

My larger point for doing this is that it seemed to fit nicely with our existing transaction model, the whole thing is fresh in my head, and it's been requested now three times that I've seen.

I agree it needs testing with an actual transaction manager.

@spencerkimball spencerkimball force-pushed the xa-txn-support branch 2 times, most recently from 28a3135 to 03eb948 Compare February 5, 2018 21:43
New syntax for `PREPARE TRANSACTION <gid>`, `COMMIT PREPARED <tid>`,
and `ROLLBACK PREPARED <tid>`, for parity with Postgres support.

Added a new `system.prepared_xacts` virtual table as well as a
`pg_catalog.pg_prepared_xacts`, for compatibility.

When a transaction is prepared, we send an `EndTransactionRequest`
as usual, but a new flag indicates the transaction should be moved
to status `PREPARED` instad of `COMMITTED`. In state `PREPARED`, a
transaction record is updated to include all of its intents, and
awaits a subsequent `EndTransactionRequest` which will move the
transaction either to `COMMITTED` or `ABORTED`. Transactions in state
`PREPARED` are given an expiration of `1h` by default. This setting
is controlled by `storage.prepared_txn.timeout`.

Release note (sql change): XA transaction support allows CockroachDB to
participate in distributed transaction with other resources (e.g.
databases, message queues, etc) using a two phase commit protocol.
@bdarnell bdarnell mentioned this pull request Mar 25, 2018
@jordanlewis jordanlewis removed their request for review June 19, 2019 01:20
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@cockroach-teamcity
Copy link
Member

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dhartunian dhartunian removed the request for review from a team June 12, 2020 17:43
@recanman
Copy link

recanman commented Jul 5, 2024

Any updates on further testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants