Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Reuse existing selectNode renders transparently #9173

Closed
nvanbenschoten opened this issue Sep 7, 2016 · 11 comments
Closed

sql: Reuse existing selectNode renders transparently #9173

nvanbenschoten opened this issue Sep 7, 2016 · 11 comments
Assignees
Milestone

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Sep 7, 2016

SQL's sortNode does a lot of work to reuse existing selectNode renders when possible when it needs to add renders. Due to the introduction of windowNode, this functionality is now needed in multiple locations. We should pull out this logic so that it can be reused.

@nvanbenschoten
Copy link
Member Author

After looking into how the sortNode works here, it seems to me like we can improve it's render reuse as well. Right now it will only reuse renders for aliased or unaliased column names, but not identical expressions. For instance, below the first render in the selectNode could have been reused instead of adding the third render.

root@:26257> explain (types) select a+1, b from three order by a+1;
+-------+---------------+----------+------------------------------------------+
| Level |     Type      | Element  |               Description                |
+-------+---------------+----------+------------------------------------------+
|     0 | select        | result   | ("a + 1" int, b int)                     |
|     1 | sort          | result   | ("a + 1" int, b int)                     |
|     2 | render/filter | result   | ("a + 1" int, b int, "a + 1" int)        |
|     2 | render/filter | render 0 | ((a)[int] + (1)[int])[int]               |
|     2 | render/filter | render 1 | (b)[int]                                 |
|     2 | render/filter | render 2 | ((a)[int] + (1)[int])[int]               |
|     3 | scan          | result   | (a int, b int, c int, rowid[hidden] int) |
+-------+---------------+----------+------------------------------------------+

@knz you touched a lot of this ColumnItem handling recently. Do you have any input/guidance on this? Is it safe to reuse renders based only on a string comparison?

@knz
Copy link
Contributor

knz commented Sep 7, 2016

Using the string representation of the expression would be a horrible, horrible hack but yes I believe it would work.

@knz
Copy link
Contributor

knz commented Sep 7, 2016

The expressions should be simplified+normalized first though. lest "(a)" and "((a))" not compare equal.

@RaduBerinde
Copy link
Member

What if there are functions in the expression? (e.g. uuid_v4())

@knz
Copy link
Contributor

knz commented Sep 7, 2016

AFAIK SQL does not mandate that non-deterministic functions return different values if they occur in different places in the same expression for the same output row. Consequently we can't expect that they return different values in different expressions.

@knz
Copy link
Contributor

knz commented Sep 7, 2016

Although if we really want to be conservative we could have a function attribute to mark non determinism and avoid merging renders if they are not deterministic. (But I don't think that's necessary)

@bdarnell
Copy link
Contributor

bdarnell commented Sep 8, 2016

Don't we already have that attribute: impure?

@knz
Copy link
Contributor

knz commented Sep 8, 2016

I think we're confused in the code about the meaning impure. The only place where it currently really matters is during normalization; we reduce impure functions to their result during normalization. This means that if they are applied to constant arguments the calls disappear when a statement is prepared.

For example all the aggregate functions are marked "impure" for this reason.
Meanwhile it would be correct to merge the renders for SELECT MAX(x) FROM x ORDER BY MAX(x)
even though MAX is marked impure.

I am still of the opinion that it doesn't matter whether there are special functions or not for the purpose of deciding to merge renders. But if we want to tag them the impure attribute is not what we want.

@knz
Copy link
Contributor

knz commented Nov 22, 2016

This issue has been partly fixed by #10799: the sortNode now properly reuses renders based on string comparison after normalization of names (to IndexedVars). The limitations are:

  • this is done before expression simplification (but after parenthesis elimination), in effect so that ORDER BY x<y+2 and ORDER BY x-2<y do not use the same renders (but ORDER BY ((a+2)) and ORDER BY a+2 do).

  • if a wildcard is used (e.g. ORDER BY foo.*) then the reuse is not enabled.

@petermattis petermattis added this to the Later milestone Feb 22, 2017
@nvanbenschoten
Copy link
Member Author

@knz I'm not up-to-date on the current state of this issue. Do you mind updating it or closing it if everything has been addressed?

@knz
Copy link
Contributor

knz commented Nov 30, 2017

The main situation that motivated this issue is addressed. Corner cases can be dealt with as optimizations later.

@knz knz closed this as completed Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants