Skip to content

Conversation

@karuppayya
Copy link
Contributor

@karuppayya karuppayya commented Jul 16, 2022

What changes were proposed in this pull request?

Expose custom metrics available on driver from DS v2 data sources on SQL UI.

Why are the changes needed?

115ed89 introduces a mechanism to add custom metrics for DS v2 data sources. But it only supports executor metrics and there is currently no mechanism to expose driver metrics from the API.

Does this PR introduce any user-facing change?

Yes
Screen Shot 2022-07-15 at 5 37 24 PM

How was this patch tested?

Added unit tests

@karuppayya
Copy link
Contributor Author

karuppayya commented Jul 16, 2022

@viirya viirya changed the title [SPARK-39635] Ability to expose driver metrics [SPARK-39635][SQL] Ability to expose driver metrics Jul 16, 2022
@viirya
Copy link
Member

viirya commented Jul 16, 2022

Seems some style issues:

[error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala:21:0: There should at least one a single empty line separating groups 3rdParty and spark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this also fixes the checkstyle issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the JIRA number? I.e. "SPARK-39635: Report driver metrics ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@karuppayya karuppayya requested a review from viirya July 17, 2022 23:18
@viirya viirya changed the title [SPARK-39635][SQL] Ability to expose driver metrics [SPARK-39635][SQL] Support driver metrics in DS v2 custom metric API Jul 20, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for BatchScanExec, there seems other places we need doing this too. E.g., MicroBatchScanExec, ContinuousScanExec which are based on DataSourceV2ScanExecBase.

Copy link
Member

@viirya viirya Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For writer, there is V2TableWriteExec, V2ExistingTableWriteExec, WriteToContinuousDataSourceExec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @cloud-fan I will create a follow up PR for the write paths.
As for read paths, I have updated ContinuousScanExec, MicroBatchScanExec similar to BatchScanExec.
Can you please point me to test case that I can run to verify if the behavior is as expected.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the API, it looks good to me. Just a few places in read/write paths we need to update the driver metrics too.

@viirya
Copy link
Member

viirya commented Jul 20, 2022

cc @cloud-fan

@zinking
Copy link

zinking commented Jul 21, 2022

apart from that, what about SupportsMetadata interface in DSV2 base implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this is redundant?

return new CustomTaskMetric[]{};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as per suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Returns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what if the driver metric is not in metrics? this will throw NoSuchElementException?

Copy link
Member

@viirya viirya Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think so. Considering this is implemented by the developers for DS v2 data sources and not from end-users, I don't think this would happen. Otherwise it is caught during development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe reportDriverMetrics? there is also no custom in PartitionReader.currentMetricsValues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted this file as I plan to cover write paths in a seperate PR

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'm okay to have a follow-up for write paths.

@viirya
Copy link
Member

viirya commented Aug 8, 2022

@karuppayya Can you rebase with master to retrigger CI? Thanks.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@viirya
Copy link
Member

viirya commented Aug 9, 2022

I will merge this in next few days if no more comments.

cc @cloud-fan

@viirya
Copy link
Member

viirya commented Aug 12, 2022

Thanks. Merging to master.

@viirya viirya closed this in d4c5815 Aug 12, 2022
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late LGTM

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, late LGTM. Thank you all.

cloud-fan added a commit that referenced this pull request Nov 15, 2024
### What changes were proposed in this pull request?
Add `reportDriverMetrics` method to `Write` API and post custom metrics from driver after v2 write commits.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
#37205 supported reporting custom driver metrics when reading from v2 table. This is to support that when writing to v2 table.

### How was this patch tested?
UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48573 from manuzhang/v2write-metrics.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Zhang, Manu <tianlzhang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants