-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule #45397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Converts local operations (i.e. ones that don't require data exchange) on `CommandResult` | ||
| * to `LocalRelation`. | ||
| */ | ||
| object ConvertCommandResultToLocalRelation extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just update ConvertToLocalRelation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CommandResult class is in spark-sql module, we cannot import it in ConvertToLocalRelation (which is in spark-catalyst)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we cannot move CommandResult into spark-catalyst module because it uses SparkPlan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we can add a new trait LocalRelationConverable to let LocalRelation and CommandResult inherit it.
| case Limit(IntegerLiteral(limit), CommandResult(output, _, _, rows)) => | ||
| LocalRelation(output, rows.take(limit)) | ||
|
|
||
| case Filter(condition, CommandResult(output, _, _, rows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at this rule, I'm on the fence now. The original target of CommandResult is for better UI support: 8013f98
e.g. , if you do sql("show tables").filter(...), we do want to see a command result node under a filter node in the UI, even if it means extra jobs.
I think certain DataFrame operations such as df.show(), df.isEmpty should just be exceptions. It looks like a single operation to users and we should not have extra jobs. But this should not be general to all operations on CommandResult
cc @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic thanks for explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, it makes sense to me, can we continue to review #45373?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's back to #45373
|
Close with comment: #45397 (comment) |
What changes were proposed in this pull request?
Add
ConvertCommandResultToLocalRelationoptimizer to convert CommandResult to LocalRelation.Why are the changes needed?
address comment: #45373 (comment)
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added new UT.
Was this patch authored or co-authored using generative AI tooling?
No