Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vitessdriver: add support for DistributedTx #9451

Merged
merged 7 commits into from
Jan 10, 2022

Conversation

derekperkins
Copy link
Member

@derekperkins derekperkins commented Dec 29, 2021

Description

This enables users of the Go driver to serialize a session and connect to an existing transaction while still using the database/sql driver, as presented by @dkhenry here
https://www.cncf.io/online-programs/transactional-microservices-with-vitess-coordination-without-state/

  1. A new SessionToken is introduced, a protobuf encoded vtgatepb.Session represented as base64
  2. A new "magic" query is introduced to retrieve the SessionToken: vt_session_token. This is intercepted at the Go driver layer, and is necessary to get around Go interface implementation details. The query string is not exposed to users, so it could be anything if we want. We could promote this to a vtgate construct to simplify the use case across multiple clients if we wanted to formalize the token.
  3. The Go Tx object is prevented from calling Commit or Rollback, not for technical reasons, but to try to prevent misuse. This could change in the future if we wanted, but I prefer to be more strict up front.
  4. The test only hits the fakeserver, as there are no current driver tests to do this end to end.

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Signed-off-by: Derek Perkins <derek@nozzle.io>
@derekperkins derekperkins added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: General Changes throughout the code base labels Dec 29, 2021
@derekperkins derekperkins marked this pull request as draft December 29, 2021 23:07
Signed-off-by: Derek Perkins <derek@nozzle.io>
Signed-off-by: Derek Perkins <derek@nozzle.io>
this simplifies the session token usage so that the configuration object no longer has to store the session struct itself

Signed-off-by: Derek Perkins <derek@nozzle.io>
Copy link
Contributor

@dkhenry dkhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I like it, I would wait until @deepthi or @sougou can get 👀 on it before you land to make sure we aren't violating some unknown constraint.

Signed-off-by: Derek Perkins <derek@nozzle.io>
@derekperkins derekperkins added the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Dec 30, 2021
Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request looks great to me besides the failing unit test

go/vt/vitessdriver/driver.go Show resolved Hide resolved
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.
Let's see what @sougou and @systay have to say.

go/vt/vtgate/vtgateconn/vtgateconn.go Show resolved Hide resolved
Signed-off-by: Derek Perkins <derek@nozzle.io>
Comment on lines +286 to +308
// this is designed to be run after all new work has been done in the tx, similar to
// where you would traditionally run a tx.Commit, to help prevent you from silently
// losing transactional data.
validationFunc := func() error {
var sessionToken string
sessionToken, err = SessionTokenFromTx(ctx, tx)
if err != nil {
return err
}

session, err = sessionTokenToSession(sessionToken)
if err != nil {
return err
}

if len(session.ShardSessions) > originalShardSessionCount {
return fmt.Errorf("mismatched ShardSession count: originally %d, now %d",
originalShardSessionCount, len(session.ShardSessions),
)
}

return nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this in our setup and found this pathological edge case where there were no errors, but transactional data just disappeared. After tracking it down, I hadn't done any transactional work before sending over the session token, so it had no shard sessions, leaving the new ones orphaned.

The idea is that this function will help to prevent that from happening. I return this func rather than making a separate call, so that users are forced to be aware of it. The pattern is similar to a context cancel function that is returned on any new context operations, so it should feel familiar to people.

I thought about doing a deep comparison on the shard sessions, but that seemed like more work than just comparing the counts without adding any extra benefits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkhenry @frouioui @deepthi @sougou this is a decent enough change that it's worth another quick look. The core functionality hasn't changed, this is just some safeguards, but I want to make sure that I'm fully understanding it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code lgtm.

without this check, database calls could appear to succeed with no errors, but the transaction ends up abandoned, potentially leaving users in an inconsistent state

Signed-off-by: Derek Perkins <derek@nozzle.io>
@derekperkins
Copy link
Member Author

derekperkins commented Jan 10, 2022

This is working well in our infrastructure, not at particularly high scale, but enough to feel good about it. I'm very glad that the first case I tried it on exposed the shard session bug - I believe this should be fairly safe for general consumption with the safeguards in place. I'll go ahead and merge with the approvals given.

@derekperkins derekperkins merged commit b801948 into vitessio:main Jan 10, 2022
@derekperkins derekperkins deleted the distributed-tx-go-driver branch January 10, 2022 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: General Changes throughout the code base release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants