Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Mar 6, 2024

What changes were proposed in this pull request?

Add ConvertCommandResultToLocalRelation optimizer to convert CommandResult to LocalRelation.

Why are the changes needed?

address comment: #45373 (comment)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new UT.

Was this patch authored or co-authored using generative AI tooling?

No

* Converts local operations (i.e. ones that don't require data exchange) on `CommandResult`
* to `LocalRelation`.
*/
object ConvertCommandResultToLocalRelation extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just update ConvertToLocalRelation?

Copy link
Member Author

@wForget wForget Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CommandResult class is in spark-sql module, we cannot import it in ConvertToLocalRelation (which is in spark-catalyst)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we cannot move CommandResult into spark-catalyst module because it uses SparkPlan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we can add a new trait LocalRelationConverable to let LocalRelation and CommandResult inherit it.

@wForget wForget changed the title [WIP] Add ConvertCommandResultToLocalRelation rule [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule Mar 7, 2024
@wForget wForget marked this pull request as ready for review March 7, 2024 05:14
case Limit(IntegerLiteral(limit), CommandResult(output, _, _, rows)) =>
LocalRelation(output, rows.take(limit))

case Filter(condition, CommandResult(output, _, _, rows))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at this rule, I'm on the fence now. The original target of CommandResult is for better UI support: 8013f98

e.g. , if you do sql("show tables").filter(...), we do want to see a command result node under a filter node in the UI, even if it means extra jobs.

I think certain DataFrame operations such as df.show(), df.isEmpty should just be exceptions. It looks like a single operation to users and we should not have extra jobs. But this should not be general to all operations on CommandResult

cc @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic thanks for explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it makes sense to me, can we continue to review #45373?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea let's back to #45373

@wForget
Copy link
Member Author

wForget commented Mar 7, 2024

Close with comment: #45397 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants