Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Initial Support for Spark 4.0 #10622

Closed
wants to merge 17 commits into from
Closed

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Jul 2, 2024

This PR has the initial support for Spark 4.0. I use v4.0.0-preview1 for now. Will switch to v4.0.0

Fixes #10497

@huaxingao huaxingao changed the title Initial Support for Spark 4.0 WIP: Initial Support for Spark 4.0 Jul 2, 2024
@amogh-jahagirdar
Copy link
Contributor

Updated the description to link #10497, this is great to see @huaxingao

@huaxingao huaxingao force-pushed the spark_4.0 branch 2 times, most recently from f2e231f to 515b2e0 Compare July 13, 2024 04:39
@raphaelauv
Copy link

raphaelauv commented Jul 17, 2024

btw spark 4 is going to use parquet 1.14.1
https://issues.apache.org/jira/browse/SPARK-48177

@@ -107,7 +107,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17]
jvm: [17]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more of a serious change since we won't be supporting JVM 8 and 11 if we do this

@@ -255,7 +255,7 @@ protected <T> void createOrReplaceView(String name, List<T> data, Encoder<T> enc
private Dataset<Row> toDS(String schema, String jsonData) {
List<String> jsonRows =
Arrays.stream(jsonData.split("\n"))
.filter(str -> !str.trim().isEmpty())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious why we need to make the changes in v3.5? Can we focus on v4.0 change in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3.5 changes were made by others and were incorporated during the rebase. I will clean this up.

@@ -948,9 +948,9 @@ public void testAddFilesWithParallelism() {
sql("SELECT * FROM %s ORDER BY id", tableName));
}

private static final List<Object[]> EMPTY_QUERY_RESULT = Lists.newArrayList();
private static final List<Object[]> emptyQueryResult = Lists.newArrayList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we are changing the constant here


private static final StructField[] STRUCT = {
private static final StructField[] struct = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or here

@@ -29,7 +29,7 @@

class RandomGeneratingUDF implements Serializable {
private final long uniqueValues;
private final Random rand = new Random();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be having any changes for Spark3.5 in this PR

@raphaelauv
Copy link

spark-4.0.0-preview2 use parquet 1.14.2
https://issues.apache.org/jira/browse/SPARK-49310

@huaxingao
Copy link
Contributor Author

It's easier to have a new PR than update this one. I am closing this one and open a new PR

@huaxingao huaxingao closed this Oct 8, 2024
@huaxingao huaxingao deleted the spark_4.0 branch October 8, 2024 21:21
@huaxingao
Copy link
Contributor Author

cc @aihuaxu @RussellSpitzer
Spark 4.0 preview1 works OK now with the new PR, but there are still a few issues with Preview 2. I am working on fixing these new issues in Preview 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Spark 4.0.0
5 participants