[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

Yaohua628 · 2022-10-22T00:34:34Z

What changes were proposed in this pull request?

(This cherry-picks #36801)

We added the support to query the _metadata column with a file-based streaming source: #35676.

We propose to use transformUp instead of match when pattern matching the dataPlan in MicroBatchExecution runBatch method in this PR. It is fine for FileStreamSource because FileStreamSource always returns one LogicalRelation node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247).

But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make _metadata work if someone wants to customize FileStreamSource getBatch.

Why are the changes needed?

Robust

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

### What changes were proposed in this pull request? We added the support to query the `_metadata` column with a file-based streaming source: apache#35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes apache#36801 from Yaohua628/spark-39404. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

Yaohua628 · 2022-10-22T00:36:10Z

cc: @HeartSaVioR @felipepessoto

HeartSaVioR · 2022-10-22T10:24:57Z

https://github.com/Yaohua628/spark/runs/9040960126

Build passed - it looks to be not reflected.

HeartSaVioR

+1

HeartSaVioR · 2022-10-22T10:26:28Z

Thanks! Merging to 3.3.

### What changes were proposed in this pull request? (This cherry-picks #36801) We added the support to query the `_metadata` column with a file-based streaming source: #35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #38337 from Yaohua628/spark-39404-3-3. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

AmplabJenkins · 2022-10-22T11:47:00Z

Can one of the admins verify this patch?

github-actions bot added SQL STRUCTURED STREAMING labels Oct 22, 2022

felipepessoto approved these changes Oct 22, 2022

View reviewed changes

HyukjinKwon approved these changes Oct 22, 2022

View reviewed changes

HeartSaVioR approved these changes Oct 22, 2022

View reviewed changes

HeartSaVioR closed this Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

Uh oh!

Yaohua628 commented Oct 22, 2022 •

edited by HeartSaVioR

Loading

Uh oh!

Yaohua628 commented Oct 22, 2022

Uh oh!

HeartSaVioR commented Oct 22, 2022

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Oct 22, 2022

Uh oh!

AmplabJenkins commented Oct 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

Uh oh!

Conversation

Yaohua628 commented Oct 22, 2022 • edited by HeartSaVioR Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Yaohua628 commented Oct 22, 2022

Uh oh!

HeartSaVioR commented Oct 22, 2022

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Oct 22, 2022

Uh oh!

AmplabJenkins commented Oct 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Yaohua628 commented Oct 22, 2022 •

edited by HeartSaVioR

Loading