-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39404][SS] Minor fix for querying _metadata in streaming
#36801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks for fixing missed spot! The change is obvious and we can go with existing tests.
_metadata in streaming_metadata in streaming
_metadata in streaming_metadata in streaming
|
Thanks! Merging to master. |
|
@HeartSaVioR @Yaohua628, I don't see this commit in 3.3.1-rc4 branch, while we have more recent commits, e.g.: 946a960 in RC4 I'm wondering if any particular reason to don't include this fix. Thanks |
Ah, good catch! I guess we missed merging it to 3.3. I will have a backport PR shortly cc @HeartSaVioR Thanks! |
### What changes were proposed in this pull request? We added the support to query the `_metadata` column with a file-based streaming source: apache#35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes apache#36801 from Yaohua628/spark-39404. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
### What changes were proposed in this pull request? (This cherry-picks #36801) We added the support to query the `_metadata` column with a file-based streaming source: #35676. We propose to use `transformUp` instead of `match` when pattern matching the `dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine for `FileStreamSource` because `FileStreamSource` always returns one `LogicalRelation` node (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247). But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make `_metadata` work if someone wants to customize `FileStreamSource` `getBatch`. ### Why are the changes needed? Robust ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #38337 from Yaohua628/spark-39404-3-3. Authored-by: yaohua <yaohua.zhao@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
We added the support to query the
_metadatacolumn with a file-based streaming source: #35676.We propose to use
transformUpinstead ofmatchwhen pattern matching thedataPlaninMicroBatchExecutionrunBatchmethod in this PR. It is fine forFileStreamSourcebecauseFileStreamSourcealways returns oneLogicalRelationnode (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247).But the proposed change will make the logic robust and we really should not rely on the upstream source to return a desired plan. In addition, the proposed change could also make
_metadatawork if someone wants to customizeFileStreamSourcegetBatch.Why are the changes needed?
Robust
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests