WIP -- Working to support migration-scoped queries from VTGate #2606

bbeaudreault · 2017-02-28T23:29:44Z

This will be WIP for a few weeks, but I wanted to start having travis build/test it now as I do some testing internally. I'll ping for a review when ready.

The goal here is to provide basic initial support for running migrations through VTGate. For now I am using ExecuteShards. In the future we use a different fronting endpoint, and can unify with schemamanger/schema_swap. But first step is to be able to route them through VTGate at all.

I'll be adding to this PR as I encounter blockers during my internal tests.

bbeaudreault · 2017-03-02T18:10:53Z

go/vt/tabletserver/query_executor.go

@@ -295,25 +295,38 @@ func (qre *QueryExecutor) execDDL() (*sqltypes.Result, error) {
 return nil, vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "DDL is not understood")
 }

+ if qre.transactionID != 0 {


Liquibase disables autocommit so that it can handle some changesets such as adding a not-null constraint with a default value. This particular changeset results in 2 queries: 1 to update the existing values, and 1 to add the constraint. These should happen in a single transaction. Liquibase disables setAutoCommit, so the JDBC does a BEGIN to start the transaction. This works through vtgate, but when vttablet sees it it starts its own transaction because it's a DDL. The two transactions actually deadlock each other in this case.

The downside of this change is that for these multi-query transactions we no longer call schema.Engine.Reload(). Trying to think of ways around that. Can we mark somewhere (in the session?) to reload the schema on commit?

I ended up having to fix this. I went ahead and added a marker on the TxConnection -- see ReloadSchemaOnCommit. Not sure if you have a different approach you'd prefer @sougou

…onflicts with force_eof

…s an NPE on Table.Name, since creates are the only one to only have a NewName

…yntax in the test failed to parse and was incorrect according to docs

bbeaudreault · 2017-03-02T23:18:12Z

@sougou if you can take a look when you have a chance. This is working for me now, I was able to run a number of migrations through this. I commented on one area we should specifically look at.

Another gotcha I'd like to tackle is:

Even though I'm using ExecuteShards, I had to turn strict mode off. I'd like to have an in-between mode. Everything else other than migrations should be treated as strict. We shouldn't be doing sharding or filtered replication while running a migration.

I have a separate branch for supporting ExecuteShards in the JDBC client.

I also am not totally sure what tests to add, but will look to add some.

bbeaudreault · 2017-03-03T00:42:01Z

I'm not sure what's up with travis. The endtoend tests fail, looks like they fail trying to create a json column in mariadb, with maria returning a syntax error. Running them locally works, but they've consistently failed in travis.

michael-berlin · 2017-03-03T01:49:38Z

Are sure that's the problem?

There's a hanging transaction and then a stacktrace gets printed?

Hanging may be here?

/home/travis/gopath/src/github.com/youtube/vitess/go/vt/tabletserver/endtoend/transaction_test.go:316 +0x1ec

https://travis-ci.org/youtube/vitess/jobs/207189600

bbeaudreault · 2017-03-03T02:01:31Z

Hmm not totally sure how to grok those various stacktraces around the hang. I'm guessing it could be any one of them? I'm not able to reproduce locally, but I did make changes to the DDL transactions in query_executor.go. So will try to see if I can reproduce or narrow it down.

bbeaudreault · 2017-03-03T12:36:39Z

Not sure why I wasn't triggering the failure locally, but I pushed a fix and Travis is happy again!

michael-berlin · 2017-03-03T19:51:56Z

Not sure why I wasn't triggering the failure locally, but I pushed a fix and Travis is happy again!

Travis usually has higher load and therefore it sometimes exposes races which don't show up on a beefy, idle workstation. Some suggestions to reproduce such a problem:

Compile the test binary (go test -c) and run it in a while true loop in your shell. This allows you to run the test more often. I understand that's not really an option for the slow endtoend test ;(
You could try to run go test with -race which slows down the execution.
Do other things to increase the load on your computer e.g. compile the Linux kernel :)

sougou · 2017-03-03T20:00:09Z

go/vt/sqlparser/ast.go

+// Lowered returns a new TableName where the Name and Qualifier have been converted
+// to lower case
+func (node *TableName) Lowered() *TableName {
+ return &TableName{Name: node.Name.Lowered(), Qualifier: node.Qualifier.Lowered()}


Looks like you're using this for lower-casing view names. If so, Qualifier should not be lowered. So, you may have to also rename this function to ToViewName.

sougou · 2017-03-03T20:01:58Z

go/vt/sqlparser/ast.go

@@ -2065,6 +2071,12 @@ func (node *TableIdent) UnmarshalJSON(b []byte) error {
 return nil
 }

+// Lowered returns a new TableIdent where the backing string has been
+// converted to lowercase
+func (node TableIdent) Lowered() TableIdent {


Since this function should generally not be used from outside this package, you could make it private, or even get rid of it and do the conversion directly in ToViewName.

sougou · 2017-03-03T20:03:00Z

go/vt/tabletserver/engines/schema/schema_engine.go

@@ -329,19 +329,6 @@ func (se *Engine) TableWasCreatedOrAltered(ctx context.Context, tableName string
 return nil
 }

-// TableWasDropped must be called if a table was dropped.
-func (se *Engine) TableWasDropped(tableName sqlparser.TableIdent) {


Need to delete TableWasCreatedOrAltered also.

TableWasCreatedOrAltered is used by Reload. I can still remove from that function if you'd like. But I wasn't sure of the implications.

It looks like it plays a pretty big role in Reload

Forgot about that. This is ok then.

sougou · 2017-03-03T20:05:03Z

go/vt/tabletserver/planbuilder/plan_test.go

@@ -128,11 +128,17 @@ func TestDDLPlan(t *testing.T) {
 t.Fatalf("Error marshalling %v", plan)
 }
 matchString(t, tcase.lineno, expected["Action"], plan.Action)
- matchString(t, tcase.lineno, expected["TableName"], plan.TableName.String())
- matchString(t, tcase.lineno, expected["NewName"], plan.NewName.String())
+ matchString(t, tcase.lineno, expected["TableName"], renderTableName(plan.TableName))


Use sqlparser.String instead.

sougou · 2017-03-03T20:09:33Z

go/vt/tabletserver/query_executor.go

@@ -295,25 +295,33 @@ func (qre *QueryExecutor) execDDL() (*sqltypes.Result, error) {
 return nil, vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "DDL is not understood")
 }

- conn, err := qre.tsv.te.txPool.LocalBegin(qre.ctx)
+ if qre.transactionID != 0 {


We should probably fail here instead. DDLs have an implicit commit in them. So, this will mess up our internal transaction state.

The other option is that we end the transaction here with our own commit. This will at least cause future DMLs against the transaction to fail, which is the right behavior.

Can you explain a little more about why you think we should do that? I'm not sure if my original use-case was lost in the commit stream, but here's what I see from my side:

Liquibase (a major java schema migration manager) converts a user-specified "changeset" into a bundle of queries. In many cases that results in just 1 query, like an ALTER/CREATE/etc. But for some changesets, that can be multiple queries. Here's an example:

<changeSet id="Populate initial values and make not null" author="bbeaudreault" context="job"> <addNotNullConstraint tableName="todo" columnName="list_id" defaultNullValue="1" columnDataType="int"/> </changeSet>

This is a reasonably common example: Adding a not-null constraint on a column, and first setting the default value on all existing rows. The way to handle this is to do 2 queries, wrapped in a transaction:

2017-03-02T20:34:26.971972Z 40 Query begin 2017-03-02T20:34:26.972148Z 40 Query update vttest.todo set list_id = '1' where list_id is null/* vtgate:: filtered_replication_unfriendly */ 2017-03-02T20:34:27.004152Z 40 Query ALTER TABLE vttest.todo MODIFY list_id INT NOT NULL 2017-03-02T20:34:27.061314Z 40 Query commit

The above works great with the changes in this PR. However, previously the queries would come in like this:

begin update vttest.todo set list_id = '1' where list_id is null/* vtgate:: filtered_replication_unfriendly */ begin ALTER TABLE vttest.todo MODIFY list_id INT NOT NULL --- unreachable: commit commit

You may be right that all DDLs have an implicit commit -- is that a MySQL thing, or a vitess thing? If Vitess, is there a reason that's the case? Some other way we can do it?

The way I see it is this is a reasonable use-case, and one used by a major tool that is very popular in the java community. So there should be a way to support it, not throw an exception.

One option, which would solve this case but not sure about all other changesets: We can still do the implicit commit, but only call LocalBegin if we're not already in a transaction.

Thoughts?

I see at https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html that it is a MySQL restriction. Since other places in vitess can create transactions, i.e. BeginExecute, should I instead just commit any active transaction and let the rest of the function work as it used to?

Sorry @sougou I think I misread your comment/suggestion based on my lack of experience with this. I am going to try to implement your second suggestion, which is to commit any running transaction before starting the new one.

sougou · 2017-03-03T20:13:06Z

go/vt/tabletserver/query_executor.go

+ return result, nil
+ }
+
+ result, err := qre.execAsTransaction(func(conn *TxConnection) (*sqltypes.Result, error) {


Don't know why the other code was hanging, but this is the right thing to do anyway.

I think it was because I was not doing a commit or rollback on error -- prior to this PR we were deferring the LocalCommit, so it always happened even on failure. I had changed it to an explicit LocalCommit on success. Something with the extra load on travis was causing execSql to fail and the commit never happened and the connection never returned to the pool. Using execAsTransaction properly handles cleanup now. That's my guess at least.

sougou · 2017-03-03T20:14:25Z

go/vt/tabletserver/tx_pool.go

@@ -171,12 +172,24 @@ func (axp *TxPool) Begin(ctx context.Context) (int64, error) {
 }

 // Commit commits the specified transaction.
-func (axp *TxPool) Commit(ctx context.Context, transactionID int64, messager *MessagerEngine) error {
+func (axp *TxPool) Commit(ctx context.Context, transactionID int64, messager *MessagerEngine, schemaEngine *schema.Engine) error {


This change shouldn't be needed. We should commit & reload schema in QueryExecutor instead (if we still want to allow this).

… -- DDL's have an implicit commit.

bbeaudreault · 2017-03-03T21:19:17Z

I've pushed a version where we close the open transaction. I'm going to do some internal testing to see how that goes. I am mildly concerned that the transactionIDs will somehow be impacted in vtgate and the JDBC driver, but we'll see.

bbeaudreault · 2017-03-03T22:32:23Z

The new approach does not work, sort of predictably on reflection. Liquibase tries to call commit at the end, and the transaction does not exist anymore. So vttablet returns:

vttablet: rpc error: code = 10 desc = transaction 1488578017635636179: not found

I've noticed that commit in mysql is a no-op if there is no transaction. I wonder if the same should happen in vitess.

sougou · 2017-03-03T22:44:42Z

The connections used by vitess all have autocommit=ON. This is required for connection pools. We have three options:

We could recognize a DDL on the client side and short-circuit the commit that follows.
We can start a new transaction on that same connection by issuing a begin and Recycle it. Maybe implement a BeginAgain call on the TxConn.
We can mark the TxConn as committed and just return it to the pool on the next commit, but disallow other statements.

…ction due to DDL

bbeaudreault · 2017-03-03T23:20:33Z

I've implemented an initial solution using option 2. I'm testing it out internally while travis runs, to see if this works for the Liquibase use-case.

bbeaudreault · 2017-03-03T23:38:08Z

@sougou this worked for me. If you're ok with the approach I will add tests to ensure we are not leaking transactions or anything for the defined use-case.

I've run a 30 or so migrations through this and see no leaked transactions (based on no transactionKiller logs). Going to add some explicit tests though, if approved.

sougou · 2017-03-04T01:01:30Z

go/vt/tabletserver/query_executor.go

- return nil, err
- }
- err = qre.tsv.te.txPool.LocalCommit(qre.ctx, conn, qre.tsv.messager)
+ err := qre.tsv.te.txPool.Commit(qre.ctx, qre.transactionID, qre.tsv.messager)


I don't think you can rely on execAsTransaction for this code path because it starts its own transaction. You have to perform your own qre.execSQL here. So, the flow should be:

txPool.Get defer conn.Recycle qre.execSQL conn.BeginAgain (this is a method of TxConn) return (skip execAsTransaction)

This way, the current TxConn retains its transaction id and it remains in the active pool. So, the subsequent commit from the app will be honored.

sougou · 2017-03-04T01:12:53Z

go/vt/tabletserver/tx_pool.go

@@ -141,6 +141,22 @@ func (axp *TxPool) WaitForEmpty() {
 // Begin begins a transaction, and returns the associated transaction id.
 // Subsequent statements can access the connection through the transaction id.
 func (axp *TxPool) Begin(ctx context.Context) (int64, error) {
+ return axp.internalBegin(ctx, func() int64 {


TxPool need not change. Instead, add BeginAgain to TxConn. Make it execute a commit and begin. The commit ensures that anything accidentally uncommitted will not get lost.

bbeaudreault · 2017-03-04T01:46:21Z

Thanks for the feedback. Made changes based on suggestion. Testing in my environment now

sougou · 2017-03-04T01:52:40Z

go/vt/sqlparser/ast.go

+func (node *TableName) ToViewName() *TableName {
+ return &TableName{
+ Name: NewTableIdent(strings.ToLower(node.Name.v)),
+ Qualifier: NewTableIdent(strings.ToLower(node.Qualifier.v)),


One last nit. I think this should not be lowered because qualifiers are case-sensitive. That's what I was trying to say in my earlier comment.

sougou · 2017-03-04T01:53:06Z

go/vt/tabletserver/engines/schema/schema_engine.go

@@ -329,19 +329,6 @@ func (se *Engine) TableWasCreatedOrAltered(ctx context.Context, tableName string
 return nil
 }

-// TableWasDropped must be called if a table was dropped.
-func (se *Engine) TableWasDropped(tableName sqlparser.TableIdent) {


Forgot about that. This is ok then.

bbeaudreault · 2017-03-04T02:47:05Z

Thanks!

bbeaudreault · 2017-03-04T03:24:25Z

I saw travis failed after we merged somehow. I just kicked off another build and if it continues to fail I'll try to fix in the morning

bbeaudreault · 2017-03-04T03:32:57Z

Everything looks good on re-run

googlebot added the cla: yes label Feb 28, 2017

bbeaudreault commented Mar 2, 2017

View reviewed changes

bbeaudreault force-pushed the migrations_from_vtgate branch from 53963f5 to c35a4e6 Compare March 2, 2017 20:41

bbeaudreault added 8 commits March 2, 2017 18:04

Reload schemas on all DDLs. Support table qualifiers in DDL queries.

1f5abee

For now only support table_name in non-create because the lookahead c…

ff025ee

…onflicts with force_eof

create tables cant have a row count, so no point checking. This avoid…

76fcf66

…s an NPE on Table.Name, since creates are the only one to only have a NewName

Make table_name work for CREATE and VIEWs. Fix alter view test, the s…

177da36

…yntax in the test failed to parse and was incorrect according to docs

This test does not apply since the removal of TableWasDropped

fdfec15

If we're in a transaction, don't start a new one

e7f1749

Reload schema on commit, if necessary

53141e9

recycle connection when using transaction

ea437b5

bbeaudreault force-pushed the migrations_from_vtgate branch from c35a4e6 to ea437b5 Compare March 2, 2017 23:05

utilize existing autocommit function

b2e1456

sougou reviewed Mar 3, 2017

View reviewed changes

bbeaudreault added 3 commits March 3, 2017 15:35

simplify

f6a580f

unneeded function

11332a5

If an transaction is open, we should commit it before running the DDL…

64f1537

… -- DDL's have an implicit commit.

implement and use BeginAgain when we implicit commit a running transa…

5552cd0

…ction due to DDL

sougou reviewed Mar 4, 2017

View reviewed changes

Move BeginAgain to TxConnection

e407f90

sougou reviewed Mar 4, 2017

View reviewed changes

dont lowercase qualifier

f995e79

sougou approved these changes Mar 4, 2017

View reviewed changes

sougou merged commit e2f97f8 into vitessio:master Mar 4, 2017

bbeaudreault deleted the migrations_from_vtgate branch March 4, 2017 02:45

frouioui pushed a commit to planetscale/vitess that referenced this pull request Mar 26, 2024

cherry pick of 13352 (vitessio#2606)

0bf9f8d

WIP -- Working to support migration-scoped queries from VTGate #2606

WIP -- Working to support migration-scoped queries from VTGate #2606

Conversation

bbeaudreault commented Feb 28, 2017

bbeaudreault Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

bbeaudreault Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

bbeaudreault commented Mar 2, 2017 • edited Loading

bbeaudreault commented Mar 3, 2017

michael-berlin commented Mar 3, 2017

bbeaudreault commented Mar 3, 2017

bbeaudreault commented Mar 3, 2017

michael-berlin commented Mar 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbeaudreault commented Mar 3, 2017

bbeaudreault commented Mar 3, 2017

sougou commented Mar 3, 2017

bbeaudreault commented Mar 3, 2017

bbeaudreault commented Mar 3, 2017 • edited Loading

Choose a reason for hiding this comment

sougou Mar 4, 2017 • edited Loading

Choose a reason for hiding this comment

bbeaudreault commented Mar 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbeaudreault commented Mar 4, 2017

bbeaudreault commented Mar 4, 2017

bbeaudreault commented Mar 4, 2017

bbeaudreault Mar 2, 2017 •

edited

Loading

bbeaudreault Mar 2, 2017 •

edited

Loading

bbeaudreault commented Mar 2, 2017 •

edited

Loading

bbeaudreault commented Mar 3, 2017 •

edited

Loading

sougou Mar 4, 2017 •

edited

Loading