Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #51959 . Although internal APIs are allowed to be changed, it's still better to keep compatibility if possible to avoid breaking existing Spark plugins.

This PR brings back HDFSMetadataLog and SerializedOffset to the original package, to avoid breaking the pulsar data source: https://github.com/streamnative/pulsar-spark/blob/master/src/main/scala/org/apache/spark/sql/pulsar/PulsarSources.scala#L27

Why are the changes needed?

Avoid breaking Spark plugins

Does this PR introduce any user-facing change?

No

How was this patch tested?

manual test

Was this patch authored or co-authored using generative AI tooling?

no

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @cloud-fan . Could you fix the compilation failure?

[error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/legacy.scala:30:36: type mismatch;
[error]  found   : org.apache.hadoop.conf.Configuration
[error]  required: org.apache.spark.sql.SparkSession
[error]   extends ActualHDFSMetadataLog[T](conf, path) {

cc @anishshri-db , @HeartSaVioR , @LuciferYang

Copy link
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this !

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 assuming CI passes. Thanks for the work!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in 589141e Sep 19, 2025
@LuciferYang
Copy link
Contributor

late LGTM, thank you @cloud-fan

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…pache.spark.sql.execution.streaming

### What changes were proposed in this pull request?

This is a followup of apache#51959 . Although internal APIs are allowed to be changed, it's still better to keep compatibility if possible to avoid breaking existing Spark plugins.

This PR brings back `HDFSMetadataLog` and `SerializedOffset` to the original package, to avoid breaking the pulsar data source: https://github.com/streamnative/pulsar-spark/blob/master/src/main/scala/org/apache/spark/sql/pulsar/PulsarSources.scala#L27

### Why are the changes needed?

Avoid breaking Spark plugins

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

manual test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#52387 from cloud-fan/compat.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants