Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement ConstructScan in the new factory #49682

Merged
merged 4 commits into from
Jun 8, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented May 29, 2020

sql: extract out planning of table readers into a separate method

This commit extract most of the logic of createTableReaders into
a separate method that performs the actual physical planning of table
readers given already created TableReaderSpecs. This will allow for
reuse in the follow-up work. This commit additionally slightly refactors
a few other things for reuse.

Release note: None

sql: implement ConstructPlan by new factory

This commit extracts the logic of execFactory.ConstructPlan into the
function to be used by both factories. It also extracts out another
small helper.

Release note: None

sql: implement ConstructScan in distsql spec factory

This commit adds an implementation of
distSQLSpecExecFactory.ConstructScan which combines the logic that
performs scanNode creation of execFactory.ConstructScan and physical
planning of table readers of DistSQLPlanner.createTableReaders.
I tried to refactor the code so that there is not much duplication going
on.

Notably, simple projections, renders, and filters are not yet
implemented.

Closes: #47474.

Release note: None

sql: enforce the assumption that wantedColumns is not nil for scans

After the examining the code closer, I see that now in all code paths we
populate scanColumnsConfig.wantedColumns to ask for the specific
columns to be scanned. Previously, this map could be left as nil which
would mean that all columns of the table should be scanned. This was
used only by the heuristic planner which has been removed, so we can now
update the assumption of the scan columns config. This allows us to
clean up the comments a bit.

This commit also moves makeColumnsConfig function into the shared util
file because it is used by both factories.

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label May 29, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the table-reader branch 3 times, most recently from 6a90265 to 27924b8 Compare May 29, 2020 05:04
@yuzefovich yuzefovich requested review from a team and pbardea and removed request for a team and pbardea May 29, 2020 05:04
@yuzefovich yuzefovich force-pushed the table-reader branch 3 times, most recently from 85fbc92 to 5112560 Compare June 2, 2020 22:11
@yuzefovich yuzefovich changed the title WIP on implementing ConstructScan sql: implement ConstructScan in the new factory Jun 2, 2020
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 2, 2020
@yuzefovich yuzefovich requested a review from asubiotto June 2, 2020 22:14
@yuzefovich
Copy link
Member Author

cc @rytaft

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 4 of 4 files at r2, 6 of 6 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

}

func getColsForScanToTableOrdinalMap(

would be good to have a small comment at least


pkg/sql/distsql_physical_planner.go, line 1037 at r1 (raw file):

	// scanNodes can have columns set up in a few different ways, depending on the
	// colCfg. The heuristic planner always creates scanNodes with all public
	// columns (even if some of them aren't even in the index we are scanning).

We don't have the heuristic planner anymore so you can take this sentence out :)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @rytaft)


pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would be good to have a small comment at least

Done.


pkg/sql/distsql_physical_planner.go, line 1037 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

We don't have the heuristic planner anymore so you can take this sentence out :)

Good point. I examined the code a little bit more and added another commit to this PR to clean this up.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 6 files at r6, 6 of 6 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/exec_factory_util.go, line 90 at r7 (raw file):

	// Set visibility=execinfra.ScanVisibilityPublicAndNotPublic, since all
	// columns in the "cols" set should be projected, regardless of whether
	// they're public or non- public. The caller decides which columns to

[nit] extra space in "non- public"

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I see that Alfonso was also typing comments, so I'll wait for his feedback as well.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @rytaft)


pkg/sql/exec_factory_util.go, line 90 at r7 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] extra space in "non- public"

Done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! However, seeing this makes me think of how we should test it. I think something we should definitely do is differential testing of execFactory + distsql physical planner vs distsqlSpecFactory.

Reviewed 1 of 1 files at r1, 9 of 9 files at r4, 4 of 4 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

}

func getColsForScanToTableOrdinalMap(

nit: can this function name be clearer? maybe just getColOrdinals?


pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

func (dsp *DistSQLPlanner) planTableReaders(
	planCtx *PlanningCtx,

These are a lot of arguments. What do you think of creating some argument struct and maybe documenting the arguments better?


pkg/sql/distsql_spec_exec_factory.go, line 60 at r6 (raw file):

	rowCount float64,
	locking *tree.LockingItem,
) (exec.Node, error) {

Composing the planning like this might open new doors for testing. I don't think much testing was ever added for distsql physical planning. Can you think of some tests that might be worth adding?


pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

	recommendation := canDistribute

	// ------- begin execFactory.ConstructScan -------

Are these comments convention in execFactory? s/execFactory/distSQLSpecExecFactory


pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

	// TODO(yuzefovich): scanNode adds "parallel" attribute in walk.go when
	// scanNode.canParallelize() returns true. We should plumb that info from
	// here somehow as well.

What information is missing in this scope to do that? I think you have maxResults, limit information and you would just need to check the cluster setting


pkg/sql/exec_factory_util.go, line 34 at r5 (raw file):

		instrumentation: planner.curPlan.instrumentation,
	}
	assignPlan := func(plan *planMaybePhysical, node exec.Node) {

Instead of passing a pointer to planMaybePhysical, it might be nicer to just take a *planTop here, pass in res, and have this function set the main field


pkg/sql/exec_factory_util.go, line 36 at r5 (raw file):

	assignPlan := func(plan *planMaybePhysical, node exec.Node) {
		if usesPlanNodeRepresentation {
			plan.planNode = node.(planNode)

If we're doing a type assertion, I think we can probably remove the boolean parameter and test for whatever type we want.


pkg/sql/plan_opt.go, line 182 at r6 (raw file):

		bld  *execbuilder.Builder
	)
	dsp := p.DistSQLPlanner()

When would dsp be nil?


pkg/sql/plan_opt.go, line 184 at r6 (raw file):

	dsp := p.DistSQLPlanner()
	if mode := p.SessionData().ExperimentalDistSQLPlanningMode; dsp != nil && mode != sessiondata.ExperimentalDistSQLPlanningOff {
		bld = execbuilder.New(newDistSQLSpecExecFactory(p, dsp), execMemo, &opc.catalog, root, p.EvalContext())

One thing that I'm wondering is why we create a new exec factory for every query. Is that expensive? Maybe we'd want a sync.pool, but not in this PR.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, differential testing requires that both approaches produce exactly the same physical plans, but I think this will not be the case because we're changing the "distribution" strategy - in a follow-up PR I want to introduce the notion of "partially distributed" plans. And the old path would produce "local" plans while the new path will be producing "partially distributed" plans, so the differential testing is not applicable in general.

I think the best testing strategy will be to invest into a comprehensive suite of logic tests that use EXPLAIN (distsql). A nice addition to this approach would be introducing a textual representation of the diagram so that it would be easier to examine (similar to what we have with EXPLAIN (vec)). I think we might have an issue for that.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @rytaft)


pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: can this function name be clearer? maybe just getColOrdinals?

Renamed.


pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

These are a lot of arguments. What do you think of creating some argument struct and maybe documenting the arguments better?

I extracted a utility struct, but I don't think adding some documentation is that useful because most of the fields are simply extracted from scanNodes, so I referred the reader to the documentation in there. Or is your comment about improving that documentation in general as well?


pkg/sql/distsql_spec_exec_factory.go, line 60 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Composing the planning like this might open new doors for testing. I don't think much testing was ever added for distsql physical planning. Can you think of some tests that might be worth adding?

Hm, having some kind of unit tests doesn't seem that much useful, but using EXPLAIN (distsql) more to test the physical planning is probably the way to go. I think we could expand the "execbuilder" logic tests (i.e. testoptlogic) to include the physical planning once the new factory is significantly implemented.


pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Are these comments convention in execFactory? s/execFactory/distSQLSpecExecFactory

I left them mostly for the ease of reviewing (and for myself when writing the code). I don't think they provide much benefit when the PR is ready to be merged, so I removed them.


pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What information is missing in this scope to do that? I think you have maxResults, limit information and you would just need to check the cluster setting

We do have all the necessary information in this scope, the complication is plumbing that information into EXPLAIN code which is currently not supported on the new representation (because it assumes walking over planNode tree).


pkg/sql/exec_factory_util.go, line 34 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Instead of passing a pointer to planMaybePhysical, it might be nicer to just take a *planTop here, pass in res, and have this function set the main field

This function is also used for subqueries and cascades which are planMaybePhysical and not planTop, so I don't think we can do that.


pkg/sql/exec_factory_util.go, line 36 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

If we're doing a type assertion, I think we can probably remove the boolean parameter and test for whatever type we want.

Done.


pkg/sql/plan_opt.go, line 182 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

When would dsp be nil?

Indeed, it seems like it is always set, so it was an unnecessary "defense" mechanism, removed.


pkg/sql/plan_opt.go, line 184 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

One thing that I'm wondering is why we create a new exec factory for every query. Is that expensive? Maybe we'd want a sync.pool, but not in this PR.

Hm, interesting point, I don't see a reason why we couldn't use a pool for this, but I'm also not sure whether it'll provide noticeable performance change. Left a TODO to look into this.

This commit extract most of the logic of `createTableReaders` into
a separate method that performs the actual physical planning of table
readers given already created `TableReaderSpec`s. This will allow for
reuse in the follow-up work. This commit additionally slightly refactors
a few other things for reuse.

Release note: None
This commit extracts the logic of `execFactory.ConstructPlan` into the
function to be used by both factories. It also extracts out another
small helper.

Release note: None
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, could we run differential testing without the distribution variable (i.e. always plan locally)? I think we need to come up with some way to make sure that we're not introducing bugs with this new planning approach.

I'm not sure I understand the EXPLAIN (distsql) approach, wouldn't the distribution still be an issue? Or do you mean that we wouldn't compare it with anything and write new tests for this new representation? That could work, and might be good to have, but it seems like a pretty heavy investment and doesn't mean we can't do some type of differential testing.

:lgtm:

Reviewed 10 of 10 files at r8, 5 of 5 files at r9, 6 of 6 files at r10, 5 of 5 files at r11.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, yuzefovich wrote…

I extracted a utility struct, but I don't think adding some documentation is that useful because most of the fields are simply extracted from scanNodes, so I referred the reader to the documentation in there. Or is your comment about improving that documentation in general as well?

My comment was also about improving documentation in general as well but this LGTM.


pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, yuzefovich wrote…

I left them mostly for the ease of reviewing (and for myself when writing the code). I don't think they provide much benefit when the PR is ready to be merged, so I removed them.

I actually kind of liked the general comments describing the stages minus the format


pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, yuzefovich wrote…

We do have all the necessary information in this scope, the complication is plumbing that information into EXPLAIN code which is currently not supported on the new representation (because it assumes walking over planNode tree).

Ok. Just to confirm, the tableReader still parallelizes scans when it can with the current code?

This commit adds an implementation of
`distSQLSpecExecFactory.ConstructScan` which combines the logic that
performs `scanNode` creation of `execFactory.ConstructScan` and physical
planning of table readers of `DistSQLPlanner.createTableReaders`.
I tried to refactor the code so that there is not much duplication going
on.

Notably, simple projections, renders, and filters are not yet
implemented.

Release note: None
After the examining the code closer, I see that now in all code paths we
populate `scanColumnsConfig.wantedColumns` to ask for the specific
columns to be scanned. Previously, this map could be left as `nil` which
would mean that all columns of the table should be scanned. This was
used only by the heuristic planner which has been removed, so we can now
update the assumption of the scan columns config. This allows us to
clean up the comments a bit.

This commit also moves `makeColumnsConfig` function into the shared util
file because it is used by both factories.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, could we run differential testing without the distribution variable (i.e. always plan locally)? I think we need to come up with some way to make sure that we're not introducing bugs with this new planning approach.

Yes, we could. To be clear, my comment about inability to differential testing was "in general" (I was thinking about creating some randomized queries and comparing the plans on them), but I do intend to use the differential testing approach when applicable. Namely, I'm thinking that we might want to introduce a separate cluster setting for testing purposes that would prohibit partially distributed plans - whenever the new factory creates a partially distributed plan, if that setting is set, then we would fallback to the old path, but local and fully-distributed plans should be exactly the same in the old and the new paths. I also think that we will introduce non-default logic test mode to be used for different already existing "explain" logic tests.

Or do you mean that we wouldn't compare it with anything and write new tests for this new representation?

Yes, I think that we will need to write separate tests for partially distributed plans because we haven't had such precedent before, so there is nothing to reuse. However, we will be able to reuse already existing tests if they don't end up with a partially distributed plan.

TL;DR is that I am actively thinking about the testing, and the follow-up PR implements some EXPLAIN variants in the new factory which makes testing possible.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @asubiotto and @rytaft)


pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

My comment was also about improving documentation in general as well but this LGTM.

Done.


pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I actually kind of liked the general comments describing the stages minus the format

Ok, added the modified comments back.


pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Ok. Just to confirm, the tableReader still parallelizes scans when it can with the current code?

Yes, the "parallelization" logic is done by DistSQLPlanner in planTableReaders which is shared with the old path, and this comment refers to only not being able to show the "parallel" attribute in the output of EXPLAIN.

@craig
Copy link
Contributor

craig bot commented Jun 8, 2020

Merge conflict (retrying...)

@craig
Copy link
Contributor

craig bot commented Jun 8, 2020

Build succeeded

@craig craig bot merged commit 5e0cecf into cockroachdb:master Jun 8, 2020
@yuzefovich yuzefovich deleted the table-reader branch January 16, 2021 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

execbuilder: plan TableReaderSpec directly
4 participants