sql: implement ConstructScan in the new factory #49682

yuzefovich · 2020-05-29T03:17:40Z

sql: extract out planning of table readers into a separate method

This commit extract most of the logic of createTableReaders into
a separate method that performs the actual physical planning of table
readers given already created TableReaderSpecs. This will allow for
reuse in the follow-up work. This commit additionally slightly refactors
a few other things for reuse.

Release note: None

sql: implement ConstructPlan by new factory

This commit extracts the logic of execFactory.ConstructPlan into the
function to be used by both factories. It also extracts out another
small helper.

Release note: None

sql: implement ConstructScan in distsql spec factory

This commit adds an implementation of
distSQLSpecExecFactory.ConstructScan which combines the logic that
performs scanNode creation of execFactory.ConstructScan and physical
planning of table readers of DistSQLPlanner.createTableReaders.
I tried to refactor the code so that there is not much duplication going
on.

Notably, simple projections, renders, and filters are not yet
implemented.

Closes: #47474.

Release note: None

sql: enforce the assumption that wantedColumns is not nil for scans

After the examining the code closer, I see that now in all code paths we
populate scanColumnsConfig.wantedColumns to ask for the specific
columns to be scanned. Previously, this map could be left as nil which
would mean that all columns of the table should be scanned. This was
used only by the heuristic planner which has been removed, so we can now
update the assumption of the scan columns config. This allows us to
clean up the comments a bit.

This commit also moves makeColumnsConfig function into the shared util
file because it is used by both factories.

Release note: None

cockroach-teamcity · 2020-05-29T03:17:48Z

This change is

yuzefovich · 2020-06-02T22:14:55Z

cc @rytaft

rytaft

Reviewed 1 of 1 files at r1, 4 of 4 files at r2, 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

}

func getColsForScanToTableOrdinalMap(

would be good to have a small comment at least

pkg/sql/distsql_physical_planner.go, line 1037 at r1 (raw file):

	// scanNodes can have columns set up in a few different ways, depending on the
	// colCfg. The heuristic planner always creates scanNodes with all public
	// columns (even if some of them aren't even in the index we are scanning).

We don't have the heuristic planner anymore so you can take this sentence out :)

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @rytaft)

pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would be good to have a small comment at least

Done.

pkg/sql/distsql_physical_planner.go, line 1037 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

We don't have the heuristic planner anymore so you can take this sentence out :)

Good point. I examined the code a little bit more and added another commit to this PR to clean this up.

rytaft

Reviewed 4 of 6 files at r6, 6 of 6 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

pkg/sql/exec_factory_util.go, line 90 at r7 (raw file):

	// Set visibility=execinfra.ScanVisibilityPublicAndNotPublic, since all
	// columns in the "cols" set should be projected, regardless of whether
	// they're public or non- public. The caller decides which columns to

[nit] extra space in "non- public"

yuzefovich

TFTR! I see that Alfonso was also typing comments, so I'll wait for his feedback as well.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @rytaft)

pkg/sql/exec_factory_util.go, line 90 at r7 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] extra space in "non- public"

Done.

asubiotto

Nice work! However, seeing this makes me think of how we should test it. I think something we should definitely do is differential testing of execFactory + distsql physical planner vs distsqlSpecFactory.

Reviewed 1 of 1 files at r1, 9 of 9 files at r4, 4 of 4 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)

pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

}

func getColsForScanToTableOrdinalMap(

nit: can this function name be clearer? maybe just getColOrdinals?

pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

func (dsp *DistSQLPlanner) planTableReaders(
	planCtx *PlanningCtx,

These are a lot of arguments. What do you think of creating some argument struct and maybe documenting the arguments better?

pkg/sql/distsql_spec_exec_factory.go, line 60 at r6 (raw file):

	rowCount float64,
	locking *tree.LockingItem,
) (exec.Node, error) {

Composing the planning like this might open new doors for testing. I don't think much testing was ever added for distsql physical planning. Can you think of some tests that might be worth adding?

pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

	recommendation := canDistribute

	// ------- begin execFactory.ConstructScan -------

Are these comments convention in execFactory? s/execFactory/distSQLSpecExecFactory

pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

	// TODO(yuzefovich): scanNode adds "parallel" attribute in walk.go when
	// scanNode.canParallelize() returns true. We should plumb that info from
	// here somehow as well.

What information is missing in this scope to do that? I think you have maxResults, limit information and you would just need to check the cluster setting

pkg/sql/exec_factory_util.go, line 34 at r5 (raw file):

		instrumentation: planner.curPlan.instrumentation,
	}
	assignPlan := func(plan *planMaybePhysical, node exec.Node) {

Instead of passing a pointer to planMaybePhysical, it might be nicer to just take a *planTop here, pass in res, and have this function set the main field

pkg/sql/exec_factory_util.go, line 36 at r5 (raw file):

	assignPlan := func(plan *planMaybePhysical, node exec.Node) {
		if usesPlanNodeRepresentation {
			plan.planNode = node.(planNode)

If we're doing a type assertion, I think we can probably remove the boolean parameter and test for whatever type we want.

pkg/sql/plan_opt.go, line 182 at r6 (raw file):

		bld  *execbuilder.Builder
	)
	dsp := p.DistSQLPlanner()

When would dsp be nil?

pkg/sql/plan_opt.go, line 184 at r6 (raw file):

	dsp := p.DistSQLPlanner()
	if mode := p.SessionData().ExperimentalDistSQLPlanningMode; dsp != nil && mode != sessiondata.ExperimentalDistSQLPlanningOff {
		bld = execbuilder.New(newDistSQLSpecExecFactory(p, dsp), execMemo, &opc.catalog, root, p.EvalContext())

One thing that I'm wondering is why we create a new exec factory for every query. Is that expensive? Maybe we'd want a sync.pool, but not in this PR.

yuzefovich

Hm, differential testing requires that both approaches produce exactly the same physical plans, but I think this will not be the case because we're changing the "distribution" strategy - in a follow-up PR I want to introduce the notion of "partially distributed" plans. And the old path would produce "local" plans while the new path will be producing "partially distributed" plans, so the differential testing is not applicable in general.

I think the best testing strategy will be to invest into a comprehensive suite of logic tests that use EXPLAIN (distsql). A nice addition to this approach would be introducing a textual representation of the diagram so that it would be easier to examine (similar to what we have with EXPLAIN (vec)). I think we might have an issue for that.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @rytaft)

pkg/sql/distsql_physical_planner.go, line 911 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: can this function name be clearer? maybe just getColOrdinals?

Renamed.

pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

These are a lot of arguments. What do you think of creating some argument struct and maybe documenting the arguments better?

I extracted a utility struct, but I don't think adding some documentation is that useful because most of the fields are simply extracted from scanNodes, so I referred the reader to the documentation in there. Or is your comment about improving that documentation in general as well?

pkg/sql/distsql_spec_exec_factory.go, line 60 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Composing the planning like this might open new doors for testing. I don't think much testing was ever added for distsql physical planning. Can you think of some tests that might be worth adding?

Hm, having some kind of unit tests doesn't seem that much useful, but using EXPLAIN (distsql) more to test the physical planning is probably the way to go. I think we could expand the "execbuilder" logic tests (i.e. testoptlogic) to include the physical planning once the new factory is significantly implemented.

pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Are these comments convention in execFactory? s/execFactory/distSQLSpecExecFactory

I left them mostly for the ease of reviewing (and for myself when writing the code). I don't think they provide much benefit when the PR is ready to be merged, so I removed them.

pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What information is missing in this scope to do that? I think you have maxResults, limit information and you would just need to check the cluster setting

We do have all the necessary information in this scope, the complication is plumbing that information into EXPLAIN code which is currently not supported on the new representation (because it assumes walking over planNode tree).

pkg/sql/exec_factory_util.go, line 34 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Instead of passing a pointer to planMaybePhysical, it might be nicer to just take a *planTop here, pass in res, and have this function set the main field

This function is also used for subqueries and cascades which are planMaybePhysical and not planTop, so I don't think we can do that.

pkg/sql/exec_factory_util.go, line 36 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

If we're doing a type assertion, I think we can probably remove the boolean parameter and test for whatever type we want.

Done.

pkg/sql/plan_opt.go, line 182 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

When would dsp be nil?

Indeed, it seems like it is always set, so it was an unnecessary "defense" mechanism, removed.

pkg/sql/plan_opt.go, line 184 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

One thing that I'm wondering is why we create a new exec factory for every query. Is that expensive? Maybe we'd want a sync.pool, but not in this PR.

Hm, interesting point, I don't see a reason why we couldn't use a pool for this, but I'm also not sure whether it'll provide noticeable performance change. Left a TODO to look into this.

This commit extract most of the logic of `createTableReaders` into a separate method that performs the actual physical planning of table readers given already created `TableReaderSpec`s. This will allow for reuse in the follow-up work. This commit additionally slightly refactors a few other things for reuse. Release note: None

This commit extracts the logic of `execFactory.ConstructPlan` into the function to be used by both factories. It also extracts out another small helper. Release note: None

asubiotto

In that case, could we run differential testing without the distribution variable (i.e. always plan locally)? I think we need to come up with some way to make sure that we're not introducing bugs with this new planning approach.

I'm not sure I understand the EXPLAIN (distsql) approach, wouldn't the distribution still be an issue? Or do you mean that we wouldn't compare it with anything and write new tests for this new representation? That could work, and might be good to have, but it seems like a pretty heavy investment and doesn't mean we can't do some type of differential testing.

Reviewed 10 of 10 files at r8, 5 of 5 files at r9, 6 of 6 files at r10, 5 of 5 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)

pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, yuzefovich wrote…

I extracted a utility struct, but I don't think adding some documentation is that useful because most of the fields are simply extracted from scanNodes, so I referred the reader to the documentation in there. Or is your comment about improving that documentation in general as well?

My comment was also about improving documentation in general as well but this LGTM.

pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, yuzefovich wrote…

I left them mostly for the ease of reviewing (and for myself when writing the code). I don't think they provide much benefit when the PR is ready to be merged, so I removed them.

I actually kind of liked the general comments describing the stages minus the format

pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, yuzefovich wrote…

We do have all the necessary information in this scope, the complication is plumbing that information into EXPLAIN code which is currently not supported on the new representation (because it assumes walking over planNode tree).

Ok. Just to confirm, the tableReader still parallelizes scans when it can with the current code?

This commit adds an implementation of `distSQLSpecExecFactory.ConstructScan` which combines the logic that performs `scanNode` creation of `execFactory.ConstructScan` and physical planning of table readers of `DistSQLPlanner.createTableReaders`. I tried to refactor the code so that there is not much duplication going on. Notably, simple projections, renders, and filters are not yet implemented. Release note: None

After the examining the code closer, I see that now in all code paths we populate `scanColumnsConfig.wantedColumns` to ask for the specific columns to be scanned. Previously, this map could be left as `nil` which would mean that all columns of the table should be scanned. This was used only by the heuristic planner which has been removed, so we can now update the assumption of the scan columns config. This allows us to clean up the comments a bit. This commit also moves `makeColumnsConfig` function into the shared util file because it is used by both factories. Release note: None

yuzefovich

In that case, could we run differential testing without the distribution variable (i.e. always plan locally)? I think we need to come up with some way to make sure that we're not introducing bugs with this new planning approach.

Yes, we could. To be clear, my comment about inability to differential testing was "in general" (I was thinking about creating some randomized queries and comparing the plans on them), but I do intend to use the differential testing approach when applicable. Namely, I'm thinking that we might want to introduce a separate cluster setting for testing purposes that would prohibit partially distributed plans - whenever the new factory creates a partially distributed plan, if that setting is set, then we would fallback to the old path, but local and fully-distributed plans should be exactly the same in the old and the new paths. I also think that we will introduce non-default logic test mode to be used for different already existing "explain" logic tests.

Or do you mean that we wouldn't compare it with anything and write new tests for this new representation?

Yes, I think that we will need to write separate tests for partially distributed plans because we haven't had such precedent before, so there is nothing to reuse. However, we will be able to reuse already existing tests if they don't end up with a partially distributed plan.

TL;DR is that I am actively thinking about the testing, and the follow-up PR implements some EXPLAIN variants in the new factory which makes testing possible.

TFTRs!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @asubiotto and @rytaft)

pkg/sql/distsql_physical_planner.go, line 1063 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

My comment was also about improving documentation in general as well but this LGTM.

Done.

pkg/sql/distsql_spec_exec_factory.go, line 74 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I actually kind of liked the general comments describing the stages minus the format

Ok, added the modified comments back.

pkg/sql/distsql_spec_exec_factory.go, line 100 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Ok. Just to confirm, the tableReader still parallelizes scans when it can with the current code?

Yes, the "parallelization" logic is done by DistSQLPlanner in planTableReaders which is shared with the old path, and this comment refers to only not being able to show the "parallel" attribute in the output of EXPLAIN.

craig · 2020-06-08T15:48:08Z

Merge conflict (retrying...)

craig · 2020-06-08T19:16:58Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich added the do-not-merge bors won't merge a PR with this label. label May 29, 2020

yuzefovich force-pushed the table-reader branch 3 times, most recently from 6a90265 to 27924b8 Compare May 29, 2020 05:04

yuzefovich requested review from a team and pbardea and removed request for a team and pbardea May 29, 2020 05:04

yuzefovich force-pushed the table-reader branch 3 times, most recently from 85fbc92 to 5112560 Compare June 2, 2020 22:11

yuzefovich changed the title ~~WIP on implementing ConstructScan~~ sql: implement ConstructScan in the new factory Jun 2, 2020

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 2, 2020

yuzefovich requested a review from asubiotto June 2, 2020 22:14

rytaft approved these changes Jun 3, 2020

View reviewed changes

yuzefovich force-pushed the table-reader branch from 5112560 to 59ab576 Compare June 3, 2020 21:04

yuzefovich commented Jun 3, 2020

View reviewed changes

rytaft approved these changes Jun 3, 2020

View reviewed changes

yuzefovich force-pushed the table-reader branch from 59ab576 to 8c69c99 Compare June 3, 2020 21:54

yuzefovich commented Jun 3, 2020

View reviewed changes

asubiotto suggested changes Jun 4, 2020

View reviewed changes

yuzefovich force-pushed the table-reader branch from 8c69c99 to bfe9c86 Compare June 4, 2020 18:34

yuzefovich commented Jun 4, 2020

View reviewed changes

yuzefovich added 2 commits June 4, 2020 16:52

sql: implement ConstructPlan by new factory

3ea982a

This commit extracts the logic of `execFactory.ConstructPlan` into the function to be used by both factories. It also extracts out another small helper. Release note: None

asubiotto approved these changes Jun 8, 2020

View reviewed changes

yuzefovich added 2 commits June 8, 2020 08:18

yuzefovich force-pushed the table-reader branch from bfe9c86 to 15bb4ae Compare June 8, 2020 15:19

yuzefovich commented Jun 8, 2020

View reviewed changes

craig bot merged commit 5e0cecf into cockroachdb:master Jun 8, 2020

yuzefovich deleted the table-reader branch January 16, 2021 04:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: implement ConstructScan in the new factory #49682

sql: implement ConstructScan in the new factory #49682

yuzefovich commented May 29, 2020 •

edited

Loading

cockroach-teamcity commented May 29, 2020

yuzefovich commented Jun 2, 2020

rytaft left a comment

yuzefovich left a comment

rytaft left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich left a comment

craig bot commented Jun 8, 2020

craig bot commented Jun 8, 2020

sql: implement ConstructScan in the new factory #49682

sql: implement ConstructScan in the new factory #49682

Conversation

yuzefovich commented May 29, 2020 • edited Loading

cockroach-teamcity commented May 29, 2020

yuzefovich commented Jun 2, 2020

rytaft left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jun 8, 2020

Merge conflict (retrying...)

craig bot commented Jun 8, 2020

Build succeeded

yuzefovich commented May 29, 2020 •

edited

Loading