-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Drop Spark 3.2 support #581
Conversation
cc @andygrove @viirya @kazuyukitanimura @parthchandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks installation.md
and overview.md
have 3.2 mentioned.
We can also remove spark-3.2 shims.
Additionally, we can remove a few more things e.g. ShimCometParquetUtils
, github actions, etc...
spark/src/main/scala/org/apache/spark/sql/comet/CometBatchScanExec.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but a few more things.
Github action CI for 3.2 should be dropped.
ShimCometBatchScanExec
can be also cleaned up. I.e. moving keyGroupedPartitioning
and inputPartitions
to CometBatchScanExec
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Outdated
Show resolved
Hide resolved
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Show resolved
Hide resolved
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending ci
} | ||
|
||
// TODO: remove after dropping Spark 3.2 support and directly call new FileScanRDD | ||
// TODO: remove after dropping Spark 3.4 support and directly call new FileScanRDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.4 or 3.3? I don't see we explicitly mention 3.4 in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be 3.4 because FileScanRDD
has a different signature in 4.0
Here is the 4.0 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty,
metadataExtractors: Map[String, PartitionedFile => Any] = Map.empty,
options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))
Here is the 3.4 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty,
options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about 3.3? Is it also different to Spark 3.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, 3.3 is also different from 3.4. Here is the 3.3 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty)
Spark 3.5 has the same signature as Spark 4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that is why I asked about
remove after dropping Spark 3.4 support ...
Isn't it Spark 3.3/Spark 3.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok Let me rewrite this to make it more clear
Thanks, everyone! |
* build: Drop Spark 3.2 support * remove un-used import * fix BloomFilterMightContain * revert the changes for TimestampNTZType and PartitionIdPassthrough * address comments and remove more 3.2 related code * remove un-used import * put back newDataSourceRDD * remove un-used import and put back lazy val partitions * address comments * Trigger Build * remove the missed 3.2 pipeline * address comments
Which issue does this PR close?
Closes 565.
Rationale for this change
What changes are included in this PR?
How are these changes tested?