Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: address discrepancy with pg and SRFs #20511

Closed
justinj opened this issue Dec 5, 2017 · 12 comments
Closed

sql: address discrepancy with pg and SRFs #20511

justinj opened this issue Dec 5, 2017 · 12 comments
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone

Comments

@justinj
Copy link
Contributor

justinj commented Dec 5, 2017

Postgres:

justin=# select generate_series(1,2), generate_series(1,2);
 generate_series | generate_series
-----------------+-----------------
               1 |               1
               2 |               2
(2 rows)

Us:

root@:26257/> select generate_series(1,2), generate_series(1,2);
+-----------------+-----------------+
| generate_series | generate_series |
+-----------------+-----------------+
|               1 |               1 |
|               1 |               2 |
|               2 |               1 |
|               2 |               2 |
+-----------------+-----------------+
(4 rows)

See https://stackoverflow.com/questions/39863505/what-is-the-expected-behaviour-for-multiple-set-returning-functions-in-select-cl/39864815#39864815

cc @knz @benesch

@knz knz added this to the 1.3 milestone Dec 6, 2017
@knz knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Dec 6, 2017
@knz
Copy link
Contributor

knz commented Apr 17, 2018

(Paraphrasing the SO post)

Multiple SRFs in the same render list (either within a single scalar expr or across scalar exprs) should be migrated to a single join operand in the FROM clause, that zip the values togethers into a single table (filling with NULLs where some SRFs produce fewer rows than others).

pg supports this with a special lateral join called LATERAL ROWS FROM (...) (it's not really a join).

@knz
Copy link
Contributor

knz commented Apr 17, 2018

Note that there is yet another case, not covered by this issue:

>  SELECT generate_series(generate_series(1, 3), 3);

Still unclear to me what this does. But it does not appear to use the general lateral join with the set projection that the side-by-side SRFs use.

@benesch
Copy link
Contributor

benesch commented Apr 17, 2018

In that case you just unnest the SRFs into multiple joins:

SELECT b FROM generate_series(1, 3) a, generate_series(a, 3) b;

@knz
Copy link
Contributor

knz commented Apr 17, 2018

"just". Humpf.

Thanks Nikhil for checking though. Consider this funny beast:

SELECT generate_series(generate_series(1, 3)+generate_series(10,20), 3);

According to your rule and the one we found out earlier, does this gets translated to:

SELECT b 
  FROM (SELECT @1+@2 AS start
          FROM (ROWS FROM (generate_series(1,3), generate_series(10,20)))) AS a, 
       LATERAL generate_series(a.start, 3) AS b

?

@knz
Copy link
Contributor

knz commented Apr 17, 2018

Note: ROWS FROM (a,b,c) will create a single table with the "zip" of the generators a,b,c

@benesch
Copy link
Contributor

benesch commented Apr 17, 2018

I think so! I had to rewrite your example like this to get it to work:

SELECT f2
  FROM (SELECT a + b AS start
          FROM ROWS FROM (generate_series(1,3), generate_series(10,20)) _ (a, b)) AS f1, 
       LATERAL generate_series(f1.start, 3) AS f2

Your example also doesn't produce any data. It's more interesting with a bigger upper bound:

SELECT generate_series(generate_series(1, 3)+generate_series(10,20), 20);
SELECT f2
  FROM (SELECT a + b AS start
          FROM ROWS FROM (generate_series(1,3), generate_series(10,20)) _ (a, b)) AS f1, 
       LATERAL generate_series(f1.start, 20) AS f2;

@knz
Copy link
Contributor

knz commented Apr 17, 2018

yeah ok. This is a hairy transform. I don't like it, although I fear we'll have to implement it eventually.

@knz
Copy link
Contributor

knz commented Apr 17, 2018

thanks @benesch, props for figuring it out.

@knz knz added the A-sql-pgcompat Semantic compatibility with PostgreSQL label Apr 17, 2018
@knz
Copy link
Contributor

knz commented Apr 17, 2018

@awoods187 we need to table that on the roadmap sometime. Perhaps it will slip to 2.2 (unless a free friday takes care of it)

@knz
Copy link
Contributor

knz commented Apr 17, 2018

I'd say size M

@knz
Copy link
Contributor

knz commented May 30, 2018

Update: the relationship between SRFs and ROWS FROM identified above is incorrect. See #26234 for details.

craig bot pushed a commit that referenced this issue Jun 5, 2018
26370: sql: hoist FunctionProperties to the built-in function name, instead of individual overloads r=knz a=knz

tldr: this patchset hoists the properties of built-in functions next to the name, instead of the level of individual overload implementation as previously.

This is a precursor to the fix to  #20511.

Fixes #26354.

cc @andy-kimball 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Jun 6, 2018
26223: sql: cleanups + new "project set" relop + ROWS FROM + local exec logic r=knz a=knz

This PR provides support for pg's "Project Set" logical plan operator, which underlies both the handling of set-generating functions (SRFs) and the `ROWS FROM` input SQL syntax.

Formally, this operator takes a relational operator R and a list of SRF applications L. Its semantics is a relation which performs a lateral cross join between R and (conceptually) `ROWS FROM (L)`, the functional zip of the value generators in L.

Reminder: the functional zip over iterators a,b,c returns tuples of values from a,b,c picked "simultaneously". NULLs are used when an iterator is "shorter" than another. For example `zip([1,2,3], ['a','b'])` is `[(1,'a'), (2,'b'), (3, null)]`.

The PR also greatly simplifies / clean up the typing logic around SRFs by getting rid of `TTable` and `DTable` entirely.

It also cleans up the docs of SRF built-in functions so that their return type is displayed in simple form (e.g. the documented return type of generate_series is now `int` and not `setof tuple{int as ...}`)

This PR serves several purposes simultaneously.

1. to solve #20511.
2. to provide more compatibility with pg's wrt the `ROWS FROM` syntax.
3. to serve as baseline for testing the new "ApplySRF" (really, "project set") functionality in distsql (SQL Execution team)
4. to serve as common intermediate language for the optimizer code, and as baseline for future
    optimizer code that will look into SRF transformations (SQL Optimizer team). /cc @andy-kimball 

Fixes #20511.

26474: sql: add pg_catalog.pg_shdescription r=BramGruneir a=BramGruneir

This virtual table is required for compatibility with PGAdmin.

Closes #26376.

Release note: Added the pg_catalog.pg_shdescription table for compatibility with
Postgres tools. Note that we do not support adding descriptions to shared
database objects.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>
@craig craig bot closed this as completed in #26223 Jun 6, 2018
@benesch
Copy link
Contributor

benesch commented Jun 6, 2018

omg ❤️ ❤️ ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

3 participants