Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add spilling metrics of SortMergeJoin #878

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 28, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

We don't propagate spilling metrics of SortMergeJoin from DataFusion to Comet yet. This patch adds spilling related metrics to Comet SortMergeJoin operator.

What changes are included in this PR?

How are these changes tested?

Comment on lines +33 to +37
def getNumRows: Int = if (rowAddresses == null) {
0
} else {
rowAddresses.size
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated. Just when I tested several memory configs to try to trigger spilling, this method throws NPE sometimes.

@viirya viirya requested review from andygrove and huaxingao August 28, 2024 13:46
@@ -519,6 +519,8 @@ class CometExecSuite extends CometTestBase {
assert(metrics("peak_mem_used").value > 1L)
assert(metrics.contains("join_time"))
assert(metrics("join_time").value > 1L)
assert(metrics.contains("spill_count"))
assert(metrics("spill_count").value == 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spilling is not triggered. But it should propagate related metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test that forces spilling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to do it locally, but not able trigger it yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep trying but wanted to merge these metrics first.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @viirya just wondering if those numbers taken from Spark context or from DF spilled metrics?

@viirya
Copy link
Member Author

viirya commented Aug 28, 2024

lgtm thanks @viirya just wondering if those numbers taken from Spark context or from DF spilled metrics?

They are from DataFusion SortMergeJoin operator's metrics. Comet operators' metrics are propagated from DataFusion.

@viirya viirya merged commit f4400f5 into apache:main Aug 28, 2024
75 checks passed
@viirya
Copy link
Member Author

viirya commented Aug 28, 2024

Merged. Thanks @andygrove @comphead

@viirya viirya deleted the add_smj_spill_metrics branch August 28, 2024 20:53
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants