-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix 621, where unnamed window functions shall be differentiated by partition and order by clause #622
Conversation
9966dcb
to
e2d57d2
Compare
should we update physical column name as well for better readability in the query output? https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/planner.rs#L137-L139 |
…rtition and order by clause
e2d57d2
to
b224b93
Compare
i think this is more of the strategy on how we name unnamed columns in the schema. for window functions without an alias, it'll be created with a unique name, given the function type, arg name, partition by clause, order by clause, and window frames. any of these are equivalent then the columns can be thought of the same and thus deduplicated in the projection phase, otherwise they will be planned separately. if any such columns are having a name conflict in the projection phase the schema validation logic will fail. |
Postgres is totally okay with dup names in the projection phase:
|
if we are okay with dup names then i guess just keeping the function name would be enough |
We do allow query outputs (i.e. record batch schema and physical plan schema) to contain columns with same names. This is quite common in join queries where more than one relations contain the same column name. My previous comment was more around the inconsistency on how window function names are generated in logical plane v.s. physical plane. For example, in physical plane, we don't take https://github.com/apache/arrow-datafusion/blob/master/docs/specification/output-field-name-semantic.md contains some rules on how we want to generate field names for physical schema. We don't have any rules for window functions at the moment, so maybe worth formalize it in that document as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Thanks for the advice let me address in another pull request |
Which issue does this PR close?
Closes #621
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?