-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Initial Support for Spark 4.0 #10622
Conversation
Updated the description to link #10497, this is great to see @huaxingao |
f2e231f
to
515b2e0
Compare
btw spark 4 is going to use parquet 1.14.1 |
@@ -107,7 +107,7 @@ jobs: | |||
runs-on: ubuntu-22.04 | |||
strategy: | |||
matrix: | |||
jvm: [8, 11, 17] | |||
jvm: [17] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit more of a serious change since we won't be supporting JVM 8 and 11 if we do this
@@ -255,7 +255,7 @@ protected <T> void createOrReplaceView(String name, List<T> data, Encoder<T> enc | |||
private Dataset<Row> toDS(String schema, String jsonData) { | |||
List<String> jsonRows = | |||
Arrays.stream(jsonData.split("\n")) | |||
.filter(str -> !str.trim().isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious why we need to make the changes in v3.5? Can we focus on v4.0 change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3.5 changes were made by others and were incorporated during the rebase. I will clean this up.
@@ -948,9 +948,9 @@ public void testAddFilesWithParallelism() { | |||
sql("SELECT * FROM %s ORDER BY id", tableName)); | |||
} | |||
|
|||
private static final List<Object[]> EMPTY_QUERY_RESULT = Lists.newArrayList(); | |||
private static final List<Object[]> emptyQueryResult = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we are changing the constant here
|
||
private static final StructField[] STRUCT = { | ||
private static final StructField[] struct = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or here
@@ -29,7 +29,7 @@ | |||
|
|||
class RandomGeneratingUDF implements Serializable { | |||
private final long uniqueValues; | |||
private final Random rand = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be having any changes for Spark3.5 in this PR
|
It's easier to have a new PR than update this one. I am closing this one and open a new PR |
cc @aihuaxu @RussellSpitzer |
This PR has the initial support for Spark 4.0. I use
v4.0.0-preview1
for now. Will switch tov4.0.0
Fixes #10497