Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Literal in ORDER BY window definition should not be an ordinal referring to relation column #8419

Merged
merged 12 commits into from
Dec 6, 2023

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 4, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

This continues to fix one bug found in #8410. In particular, currently a literal value in ORDER BY clause of window definition will be treated as an ordinal reference to a column in the relation which I think is the behavior of sort expressions in ORDER BY keyword. However for window definition, Postgres simply treats a literal value as literal:

postgres=# select a, rank() over (order by 1 RANGE between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk from (select 1 a union select 2 a) q;
 a | rnk 
---+-----
 1 |   1
 2 |   1
(2 rows)

postgres=# select a, rank() over (order by 1 RANGE between 1 PRECEDING AND 1 FOLLOWING) rnk from (select 1 a union select 2 a) q;
 a | rnk 
---+-----
 1 |   1
 2 |   1
(2 rows)

What changes are included in this PR?

Making a literal value in ORDER BY window definition as literal, instead of a column ordinal.

Are these changes tested?

Modified test in sqllogictest.

Are there any user-facing changes?

Literal in ORDER BY window definition is changed to literal instead of a column ordinal.

@github-actions github-actions bot added sql SQL Planner physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Dec 4, 2023
@viirya viirya changed the title fix: Literal in window definition should not an ordinal refering to relation column fix: Literal in window definition should not be an ordinal referring to relation column Dec 4, 2023
@viirya viirya changed the title fix: Literal in window definition should not be an ordinal referring to relation column fix: Literal in ORDER BY window definition should not be an ordinal referring to relation column Dec 4, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya -- this looks great to me. 🙏

I had some suggestions on comments to add, but I don't think they are necessary to merge this PR.

datafusion/sql/src/expr/order_by.rs Show resolved Hide resolved
datafusion/sql/src/expr/function.rs Show resolved Hide resolved
@ozankabak
Copy link
Contributor

Thank you @viirya for this. Do we know what ISO standard says about literals in window definitions? Is PG standard-conforming in this case?

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

Hmm, I can only find SQL-92 (https://datacadamia.com/_media/data/type/relation/sql/sql1992.txt) which defines ORDER BY for a query where sort key could be an unsigned integer which is what we familiar. But I don't find window definition in the doc.

In SQL-99 (http://web.cecs.pdx.edu/~len/sql1999.pdf), the same ORDER BY spec is changed to disallow a literal as sort key. I still don't find window definition there too.

I can only confirm that Spark also treats literals as constants in ORDER BY window definition:

scala> sql("select a, RANK() over (order by 1 rows between UNBOUNDED PRECEDING AND current row)  from (select 1 a union all select 2 a) q").show
+---+-----------------------------------------------------------------------------------------+
|  a|RANK() OVER (ORDER BY 1 ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+---+-----------------------------------------------------------------------------------------+
|  1|                                                                                        1|
|  2|                                                                                        1|
+---+-----------------------------------------------------------------------------------------+

@comphead
Copy link
Contributor

comphead commented Dec 6, 2023

This PR can potentially close #8403

UPD:

/workspaces/arrow-datafusion/datafusion-cli (fix_order_by_literal) $ ./target/debug/datafusion-cli 
DataFusion CLI v33.0.0
❯ select rank() over (order by 1)  from (select 1 a union all select 2 a) q;
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema

Now it fails, before the fix it worked but gave a wrong answer

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

❯ select rank() over (order by 1) from (select 1 a union all select 2 a) q;
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema


Now it fails, before the fix it worked but gave a wrong answer

Yea, currently you cannot have only window function without the column that has been "window", as I already added a few tests there showing the issue. I plan to address them one by one.

@comphead
Copy link
Contributor

comphead commented Dec 6, 2023

We might also handle nulls the same way we handle

SQLExpr::Value(Value::Number(v, _))

in order_by_to_sort_expr to close #8402 as well

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

We might also handle nulls the same way we handle

ORDER BY on a null as sort key is not working now as I just tried. I will propose a small follow up after this.

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

#8402 has another issue, I think. This doesn't touch PARTITION BY on scalars/literals.

@viirya viirya merged commit 439339a into apache:main Dec 6, 2023
22 checks passed
@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

Thanks @alamb @ozankabak @comphead

appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
… referring to relation column (apache#8419)

* fix: RANGE frame can be regularized to ROWS frame only if empty ORDER BY clause

* Fix flaky test

* Update test comment

* Add code comment

* Update

* fix: Literal in window definition should not refer to relation column

* Remove unused import

* Update datafusion/sql/src/expr/function.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add code comment

* Fix format

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants