Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vtctl Materialize optimizations #6207

Merged
merged 6 commits into from
May 24, 2020

Conversation

teejae
Copy link
Contributor

@teejae teejae commented May 20, 2020

  • Only get source and target schemas once, saving network roundtrips and O(n) -> O(1) performance
  • Allow empty source entries to be equivalent to select * from table, as used by VReplication
  • Change variables for better readability

@teejae teejae requested a review from sougou as a code owner May 20, 2020 21:21
@teejae
Copy link
Contributor Author

teejae commented May 20, 2020

read PR w/ hide whitespace changes, as there's a few places that have only been indented.

@teejae teejae force-pushed the tj-materialize-optimization branch 2 times, most recently from 12f6bcb to b217f5b Compare May 21, 2020 11:17
Signed-off-by: Toliver Jue <toliver@planetscale.com>
@teejae teejae force-pushed the tj-materialize-optimization branch from b217f5b to 368ccd3 Compare May 21, 2020 11:40
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the issue I commented about LGTM.

Nice, this should help greatly once we add --all to MoveTables as well!

@@ -1970,7 +1970,7 @@ func TestMaterializerTableMismatch(t *testing.T) {
delete(env.tmc.schema, "targetks.t1")

err := env.wr.Materialize(context.Background(), ms)
assert.EqualError(t, err, "source and target table names must match for copying schema: t2 vs t1")
assert.EqualError(t, err, "copy: source tables do not exist: [t1].")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the target table that is missing, right? We are copying over "select * from t2" from source into t1 on target ... Validations have to take into account that we could have tables missing in the source or tables missing in the target where we are materializing to a different table ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the test now into 2 cases, "copy" and not. "copy" still requires the same table name, and thus this is right error.

the other "non copy" case is a separate test, where CreateDdl="", and has the same original error.


if ts.CreateDdl == createDDLAsCopy {
needsCopy = true
copyTables[ts.TargetTable] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target table and source table can be different, so the table to be copied will have to use sqlparser.TableFromStatement(ts.SourceExpression) if the source expression is present ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment

Signed-off-by: Toliver Jue <toliver@planetscale.com>
@teejae
Copy link
Contributor Author

teejae commented May 22, 2020

@rohit-nayak-ps ptal. adjusted only the tests. the logic was still right, but the tests were not sufficiently granular previously.

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little over-complicated. Maybe you were not aware that you could GetSchema for all tables using []string{"/.*/"}.
What I would do is a full GetSchema for target outside the table loop.
For the source, I'd lazy-load a full GetSchema of source on first use, or even just load it upfront to keep it simple.
Then the existing code would be practically unchanged except for the additional logic for SourceExpression.

@teejae
Copy link
Contributor Author

teejae commented May 24, 2020

@sougou, I had wanted to keep GetSchema to only grabbing the tables we needed, hence didn't retrieve all of them (in case of 10k+ table schemas).

So are you suggesting what I should change in this PR is just grabbing the whole schemas without making the sub table lists?

The parsing of the table lists into maps would still be necessary, since we need to otherwise index on them.

@teejae
Copy link
Contributor Author

teejae commented May 24, 2020

@sougou I've made the changes to grab and index the full schemas before going through the table loop. PTAL.

@teejae teejae force-pushed the tj-materialize-optimization branch 4 times, most recently from c7564b7 to b33f629 Compare May 24, 2020 07:51
@teejae teejae marked this pull request as draft May 24, 2020 10:11
@teejae teejae marked this pull request as ready for review May 24, 2020 10:11
Signed-off-by: Toliver Jue <toliver@planetscale.com>
@teejae teejae force-pushed the tj-materialize-optimization branch from b33f629 to 9171b55 Compare May 24, 2020 10:13
Signed-off-by: Toliver Jue <toliver@planetscale.com>
Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it would be simpler, but the fact that the schema returned is not a map complicates this code a bit more.

We need a test to demonstrate the new behavior. After this, we're good to go.

if mz.targetVSchema.Keyspace.Sharded && mz.targetVSchema.Tables[ts.TargetTable].Type != vindexes.TypeReference {
cv, err := vindexes.FindBestColVindex(mz.targetVSchema.Tables[ts.TargetTable])

if ts.SourceExpression != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:

if ts.SourceExpression == "" {
  continue
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't i still need to add to the Filter.Rules after this block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Right. Missed that. You can duplicate that append line here before continuing.
Main concern is that there are already too many indents. I'm not happy about the existing if-else block already.

Copy link
Contributor Author

@teejae teejae May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duped. also i pulled out the filter as default assign so that we can get rid of an else as well.

go/vt/wrangler/materializer_env_test.go Show resolved Hide resolved
go/vt/wrangler/materializer.go Show resolved Hide resolved
@teejae teejae requested a review from sougou May 24, 2020 17:12
Signed-off-by: Toliver Jue <toliver@planetscale.com>
@teejae
Copy link
Contributor Author

teejae commented May 24, 2020

@sougou also what test are you proposing for this new behavior? that we don't get multiple requests for GetSchema in particular?

@sougou
Copy link
Contributor

sougou commented May 24, 2020

@sougou also what test are you proposing for this new behavior? that we don't get multiple requests for GetSchema in particular?

Mainly one where the source expression is empty.

Signed-off-by: Toliver Jue <toliver@planetscale.com>
@teejae teejae force-pushed the tj-materialize-optimization branch from 8a9c6d2 to ccdcb63 Compare May 24, 2020 19:04
}

rule.Filter = filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants