[SPARK-39404][SS] Minor fix for querying `_metadata` in streaming #36801

Yaohua628 · 2022-06-08T04:43:19Z

What changes were proposed in this pull request?

We added the support to query the _metadata column with a file-based streaming source: #35676.

We propose to use transformUp instead of match when pattern matching the dataPlan in MicroBatchExecution runBatch method in this PR. It is fine for FileStreamSource because FileStreamSource always returns one LogicalRelation node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247).

But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make _metadata work if someone wants to customize FileStreamSource getBatch.

Why are the changes needed?

Robust

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

HeartSaVioR

+1 Thanks for fixing missed spot! The change is obvious and we can go with existing tests.

HeartSaVioR · 2022-06-08T08:40:21Z

Thanks! Merging to master.

felipepessoto · 2022-10-21T17:14:27Z

@HeartSaVioR @Yaohua628, I don't see this commit in 3.3.1-rc4 branch, while we have more recent commits, e.g.: 946a960 in RC4

I'm wondering if any particular reason to don't include this fix.

Thanks

Yaohua628 · 2022-10-21T17:22:01Z

@HeartSaVioR @Yaohua628, I don't see this commit in 3.3.1-rc4 branch, while we have more recent commits, e.g.: 946a960 in RC4

I'm wondering if any particular reason to don't include this fix.

Thanks

Ah, good catch! I guess we missed merging it to 3.3. I will have a backport PR shortly cc @HeartSaVioR

Thanks!

### What changes were proposed in this pull request? We added the support to query the `_metadata` column with a file-based streaming source: apache#35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes apache#36801 from Yaohua628/spark-39404. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

### What changes were proposed in this pull request? (This cherry-picks #36801) We added the support to query the `_metadata` column with a file-based streaming source: #35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #38337 from Yaohua628/spark-39404-3-3. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

fix

613a621

github-actions bot added SQL STRUCTURED STREAMING labels Jun 8, 2022

HeartSaVioR approved these changes Jun 8, 2022

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-39404][SQL][Streaming] Minor fix for querying _metadata in streaming~~ [SPARK-39404][SQL][SS] Minor fix for querying _metadata in streaming Jun 8, 2022

HeartSaVioR changed the title ~~[SPARK-39404][SQL][SS] Minor fix for querying _metadata in streaming~~ [SPARK-39404][SS] Minor fix for querying _metadata in streaming Jun 8, 2022

HeartSaVioR closed this in 12b7e61 Jun 8, 2022

Yaohua628 mentioned this pull request Oct 22, 2022

[SPARK-39404][SS][3.3] Minor fix for querying _metadata in streaming #38337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39404][SS] Minor fix for querying `_metadata` in streaming #36801

[SPARK-39404][SS] Minor fix for querying `_metadata` in streaming #36801

Uh oh!

Yaohua628 commented Jun 8, 2022

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Jun 8, 2022

Uh oh!

felipepessoto commented Oct 21, 2022

Uh oh!

Yaohua628 commented Oct 21, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-39404][SS] Minor fix for querying _metadata in streaming #36801

[SPARK-39404][SS] Minor fix for querying _metadata in streaming #36801

Uh oh!

Conversation

Yaohua628 commented Jun 8, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jun 8, 2022

Uh oh!

felipepessoto commented Oct 21, 2022

Uh oh!

Yaohua628 commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-39404][SS] Minor fix for querying `_metadata` in streaming #36801

[SPARK-39404][SS] Minor fix for querying `_metadata` in streaming #36801

Yaohua628 commented Oct 21, 2022 •

edited

Loading