-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task #32330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137908 has finished for PR 32330 at commit
|
|
cc @cloud-fan |
| s"${metric.name()}") | ||
| customMetrics(metric.name()).set(metric.value()) | ||
| if (numRow % CustomMetrics.numRowsPerUpdate == 0) { | ||
| reader.currentMetricsValues.foreach { metric => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move it into a method to reuse code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reused method.
| object CustomMetrics { | ||
| private[spark] val V2_CUSTOM = "v2Custom" | ||
|
|
||
| private[spark] val numRowsPerUpdate = 100L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to be a long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numRow is a long, I guess this can be just int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it as int.
| customMetrics(metric.name()).set(metric.value()) | ||
| if (numRow % CustomMetrics.numRowsPerUpdate == 0) { | ||
| partitionReader.currentMetricsValues.foreach { metric => | ||
| assert(customMetrics.contains(metric.name()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how useful is the assert here. It's for internal error only and customMetrics(metric.name()) will fail too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it. I also thought it is not necessary but just added for a comment before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
@dongjoon-hyun Yes, I will rebase this PR. Thanks. |
|
Rebased and updated for the comments. @cloud-fan @dongjoon-hyun |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138161 has finished for PR 32330 at commit
|
| object CustomMetrics { | ||
| private[spark] val V2_CUSTOM = "v2Custom" | ||
|
|
||
| private[spark] val numRowsPerUpdate = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NUM_ROWS_PER_UPDATE since it's a constant?
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138198 has finished for PR 32330 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
This patch changes custom metric updating to update per certain rows (currently 100), instead of per row.
Why are the changes needed?
Based on previous discussion #31451 (comment), we should only update custom metrics per certain (e.g. 100) rows and also at the end of the task. Updating per row doesn't make too much benefit.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing unit test.