-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51049][CORE] Increase S3A Vector IO threshold for range merge #49748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thank you, @cnauroth . |
|
Could you review this PR when you have some time, @huaxingao ? |
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Pending CI
|
Thank you, @huaxingao . All tests passed. |
### What changes were proposed in this pull request? This PR aims to increase S3A Vector IO threshold for range merge. ### Why are the changes needed? Apache Spark 4.0.0 supported Hadoop Vectored IO via ORC and Parquet. As a part of [HADOOP-18855 VectorIO API tuning/stabilization](https://issues.apache.org/jira/browse/HADOOP-18855), Apache Hadoop 3.4.2 will have new threshold default values. We had better follow these update in advance until Apache Hadoop 3.4.2 is released. - apache/hadoop#7281 ### Does this PR introduce _any_ user-facing change? No, Hadoop Vectored IO features are new in Apache Spark 4.0.0 . ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49748 from dongjoon-hyun/SPARK-51049. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit b62c3f4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Merged to master/4.0. |
### What changes were proposed in this pull request? This PR aims to increase S3A Vector IO threshold for range merge. ### Why are the changes needed? Apache Spark 4.0.0 supported Hadoop Vectored IO via ORC and Parquet. As a part of [HADOOP-18855 VectorIO API tuning/stabilization](https://issues.apache.org/jira/browse/HADOOP-18855), Apache Hadoop 3.4.2 will have new threshold default values. We had better follow these update in advance until Apache Hadoop 3.4.2 is released. - apache/hadoop#7281 ### Does this PR introduce _any_ user-facing change? No, Hadoop Vectored IO features are new in Apache Spark 4.0.0 . ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49748 from dongjoon-hyun/SPARK-51049. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit b9a403a) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

What changes were proposed in this pull request?
This PR aims to increase S3A Vector IO threshold for range merge.
Why are the changes needed?
Apache Spark 4.0.0 supported Hadoop Vectored IO via ORC and Parquet.
As a part of HADOOP-18855 VectorIO API tuning/stabilization, Apache Hadoop 3.4.2 will have new threshold default values. We had better follow these update in advance until Apache Hadoop 3.4.2 is released.
Does this PR introduce any user-facing change?
No, Hadoop Vectored IO features are new in Apache Spark 4.0.0 .
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.