Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

Rules review #294

Closed
ajnavarro opened this issue Jul 18, 2018 · 4 comments
Closed

Rules review #294

ajnavarro opened this issue Jul 18, 2018 · 4 comments
Labels
enhancement New feature or request

Comments

@ajnavarro
Copy link
Contributor

To try to reduce errors in rules interactions, I wrote some guidelines to improve and simplify that rules:

Rules guideline

  • We have two kind of rules: resolution rules, and optimization rules.
    • The goal of resolution rules is to transform the parsed tree resolving abstract references (e.g. to column names and expressions) into concrete references (e.g. resolved columns referencing an exact position in a row). Since, in our model, every resolved node is also executable, physical planning is also implicit in this phase.
    • The goal of optimization rules is to make the query faster.
  • If a query tree is not resolved after resolution rules, the query must throw an error.
  • Each rule must do only one thing and be as small as possible.
  • Each rule must return a valid query tree.
    • On resolution rules, the output query tree might not be fully resolved, and that's ok.
    • On optimization rules, both the input and output trees must be resolved (and, therefore, executable).
  • Rules are expected to work well between them, but a rule cannot depend on other rule/rules to generate a valid query tree.

Apply that on rules that do not accomplish it.

@ajnavarro ajnavarro added the enhancement New feature or request label Jul 18, 2018
@erizocosmico
Copy link
Contributor

erizocosmico commented Jul 24, 2018

Each rule must return a valid query tree.

Rules are expected to work well between them, but a rule cannot depend on other rule/rules to generate a valid query tree.

This clashes with rules that defer resolution. For example, the resolve_columns rule uses deferredColumn to defer the evaluation after some other rules have been executed. Also, the assign_indexes, which uses a node that is not valid per se, to be replaced with the final node in pushdown rule.

I think having such a hard rule is not very flexible and can make resolving the query really complicated.

@ajnavarro
Copy link
Contributor Author

In that specific case, we should iterate and resolve the columns that can be resolved on that specific iteration. If some columns are not possible to be resolved, we must wait for the next Analyzer iteration to check again if we can resolve more columns. The plan will not be the same, because other rules applied his modifications and maybe on that iteration more columns can be resolved.

It's not mandatory to resolve the query plan in only one iteration.

@erizocosmico
Copy link
Contributor

You need to differentiate between "I needed more things to be resolved before resolving this" and "I can't resolve this". Without deferredColumn there's no distinction.

Also, for assign indexes, you need to gather the indexes in one rule and finally resolve in another. If you don't get to the pushdown with a placeholder node holding all the indexes, you cannot resolve (or you'd have to get all the indexes again).

Maybe we just need to come up with better ways to resolve certain things, then, because applying that is not feasible with how we do it now.

@ajnavarro
Copy link
Contributor Author

Closing this because is too broad. We should open separated issues if we found a way to simplify or improve actual rules.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants