-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add Spark UI metrics from Iceberg scan metrics #8717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Let me take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost good, I had some minor comments and it seems like we are missing the implementation and tests for a few total counters. @karuppayya, could you check if I got everything correctly?
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDeleteFiles.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDataManifests.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDeleteFileSize.java
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalDeleteFileSize.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public String description() { | ||
| return "total delete file size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include (bytes) at the end like we do in TotalFileSize?
068b9ae to
0a4c2c1
Compare
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost ready to go. I had a few minor points that would be nice to fix.
| return new CustomTaskMetric[0]; | ||
| } | ||
|
|
||
| List<CustomTaskMetric> driverMetrics = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we add an empty line after this one and before // common?
| @Override | ||
| public CustomMetric[] supportedCustomMetrics() { | ||
| return new CustomMetric[] { | ||
| new NumSplits(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we add // task metrics before this line?
Both NumSplits and NumDeletes are populated at the task level.
|
|
||
| List<CustomTaskMetric> driverMetrics = Lists.newArrayList(); | ||
| // common | ||
| driverMetrics.add(TaskTotalFileSize.from(scanReport)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you confirm TaskTotalFileSize represents the total size of read data?
If so, shall we call these metrics as TaskTotalDataFileSize and TotalDataFileSize? I know we follow the API from core but it seems a bit confusing. I had to look up the code to understand what this metric means. If we decide to rename, let's move it to the data files block below.
|
|
||
| @Override | ||
| public String description() { | ||
| return "total delete file size in bytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow what we have in TotalFileSize where use ... (bytes) instead of ... in bytes.
|
Thanks @aokolnychyi for the review, i have addressed the latest comments, ready for another round |
|
Thanks, @karuppayya! |
This change cherry-picks PR #8717 to Spark 3.4.
This is a followup to #7447 (comment)
cc: @aokolnychyi @RussellSpitzer @szehon-ho