Rules review #294

ajnavarro · 2018-07-18T07:53:03Z

To try to reduce errors in rules interactions, I wrote some guidelines to improve and simplify that rules:

Rules guideline

We have two kind of rules: resolution rules, and optimization rules.
- The goal of resolution rules is to transform the parsed tree resolving abstract references (e.g. to column names and expressions) into concrete references (e.g. resolved columns referencing an exact position in a row). Since, in our model, every resolved node is also executable, physical planning is also implicit in this phase.
- The goal of optimization rules is to make the query faster.
If a query tree is not resolved after resolution rules, the query must throw an error.
Each rule must do only one thing and be as small as possible.
Each rule must return a valid query tree.
- On resolution rules, the output query tree might not be fully resolved, and that's ok.
- On optimization rules, both the input and output trees must be resolved (and, therefore, executable).
Rules are expected to work well between them, but a rule cannot depend on other rule/rules to generate a valid query tree.

Apply that on rules that do not accomplish it.

erizocosmico · 2018-07-24T09:17:45Z

Each rule must return a valid query tree.

Rules are expected to work well between them, but a rule cannot depend on other rule/rules to generate a valid query tree.

This clashes with rules that defer resolution. For example, the resolve_columns rule uses deferredColumn to defer the evaluation after some other rules have been executed. Also, the assign_indexes, which uses a node that is not valid per se, to be replaced with the final node in pushdown rule.

I think having such a hard rule is not very flexible and can make resolving the query really complicated.

ajnavarro · 2018-07-30T09:24:37Z

In that specific case, we should iterate and resolve the columns that can be resolved on that specific iteration. If some columns are not possible to be resolved, we must wait for the next Analyzer iteration to check again if we can resolve more columns. The plan will not be the same, because other rules applied his modifications and maybe on that iteration more columns can be resolved.

It's not mandatory to resolve the query plan in only one iteration.

erizocosmico · 2018-07-30T09:37:29Z

You need to differentiate between "I needed more things to be resolved before resolving this" and "I can't resolve this". Without deferredColumn there's no distinction.

Also, for assign indexes, you need to gather the indexes in one rule and finally resolve in another. If you don't get to the pushdown with a placeholder node holding all the indexes, you cannot resolve (or you'd have to get all the indexes again).

Maybe we just need to come up with better ways to resolve certain things, then, because applying that is not feasible with how we do it now.

ajnavarro · 2018-09-10T09:40:48Z

Closing this because is too broad. We should open separated issues if we found a way to simplify or improve actual rules.

ajnavarro added the enhancement New feature or request label Jul 18, 2018

ajnavarro closed this as completed Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules review #294

Rules review #294

ajnavarro commented Jul 18, 2018

erizocosmico commented Jul 24, 2018 •

edited

Loading

ajnavarro commented Jul 30, 2018

erizocosmico commented Jul 30, 2018

ajnavarro commented Sep 10, 2018

Rules review #294

Rules review #294

Comments

ajnavarro commented Jul 18, 2018

erizocosmico commented Jul 24, 2018 • edited Loading

ajnavarro commented Jul 30, 2018

erizocosmico commented Jul 30, 2018

ajnavarro commented Sep 10, 2018

erizocosmico commented Jul 24, 2018 •

edited

Loading