Skip to content

Conversation

@manuzhang
Copy link
Member

No description provided.

@manuzhang manuzhang force-pushed the spark-write-metrics branch 3 times, most recently from 8a561bd to 316dc40 Compare October 17, 2024 15:03
@manuzhang
Copy link
Member Author

Add number of total data files to write command AppendData

CleanShot 2024-10-18 at 11 31 55@2x

@manuzhang manuzhang force-pushed the spark-write-metrics branch from 316dc40 to 99d4b65 Compare October 18, 2024 04:44

object MetricsUtils {

def postDriverMetrics(sparkContext: SparkContext, metricValues: java.util.Map[CustomMetric, Long]): Unit = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed if we can support it at Spark side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that one got in :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it will only be available in Spark 4+

@github-actions
Copy link

github-actions bot commented Dec 9, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 9, 2024
@github-actions github-actions bot removed the stale label Dec 13, 2024
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@wypoon
Copy link
Contributor

wypoon commented Jan 15, 2025

@manuzhang I am happy to see that someone is working on adding write-side Iceberg metrics to the Spark SQL UI!
I realize that this is still in a draft state, but I have some questions/suggestions.
Do you plan to add metrics only to append operations? It would be good to see them for other operations, such as delete and overwrite.
Would added data files be more useful than total data files? or we could show both?
For delete and overwrite operations, I think it would be useful to see removed data files, added delete files and removed delete files.


@Override
public String description() {
return "number of total data files";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right.
"total" implies a number, so I think the description can just be "total data files".
If you really want to include "number", then "total number of data files".

@manuzhang
Copy link
Member Author

@wypoon I plan to add metrics for all write operations, but I'd like to get the interfaces right at first. I'm not sure whether this is the best way to propagate a metricsReporter. Any thoughts?

    if (this.table instanceof BaseTable) {
      this.metricsReporter = new InMemoryMetricsReporter();
      ((BaseTable) this.table).combineMetricsReporter(metricsReporter);
    }

@manuzhang manuzhang force-pushed the spark-write-metrics branch from f91d3f5 to 2e8e0c0 Compare May 2, 2025 15:54
@manuzhang manuzhang marked this pull request as ready for review May 2, 2025 15:55
@manuzhang manuzhang force-pushed the spark-write-metrics branch from 2e8e0c0 to fa49efc Compare May 3, 2025 15:24
@manuzhang manuzhang requested review from Fokko, aokolnychyi and wypoon and removed request for wypoon May 4, 2025 15:32
@manuzhang manuzhang force-pushed the spark-write-metrics branch from fa49efc to 2643854 Compare May 12, 2025 10:29
@manuzhang
Copy link
Member Author

Snapshot of Spark SQL UI after latest update

CleanShot 2025-05-14 at 17 54 48@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants