-
Notifications
You must be signed in to change notification settings - Fork 2
Test suite prototyping for collectAsArrow #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c12a3a6 to
cfc578f
Compare
|
Great, I think this will help. I'll try it out |
pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be scoped to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is only here temporarily pending ARROW-411. Feel free to cherry pick this commit and modify to suit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is done so the test can use the same root allocator, would it fail if the tests made a different instance? I'm just wondering about the broader usage in Spark, like should Spark manage a single root allocator or just create one each operation and allow the user to override like this?
1c78926 to
1220e86
Compare
|
I don't think it would make the tests fail. Might make sense to create a child allocator for each operation in Spark |
cfc578f to
b2975a3
Compare
|
@BryanCutler I just rebased this on arrow-integration |
|
Thanks @wesm, I'll merge this so we can start using the |
Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10
|
closed with 7127b32 |
Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10
Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10
…nput of UDF as double in the failed test in udf-aggregate_part1.sql ## What changes were proposed in this pull request? It still can be flaky on certain environments due to float limitation described at apache#25110 . See apache#25110 (comment) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/6584/testReport/org.apache.spark.sql/SQLQueryTestSuite/udf_pgSQL_udf_aggregates_part1_sql___Regular_Python_UDF/ ``` Expected "700000000000[6] 1", but got "700000000000[5] 1" Result did not match for query apache#33 SELECT CAST(avg(udf(CAST(x AS DOUBLE))) AS long), CAST(udf(var_pop(CAST(x AS DOUBLE))) AS decimal(10,3)) FROM (VALUES (7000000000005), (7000000000007)) v(x) ``` Here;s what's going on: apache#25110 (comment) ``` scala> Seq("7000000000004.999", "7000000000006.999").toDF().selectExpr("CAST(avg(value) AS long)").show() +--------------------------+ |CAST(avg(value) AS BIGINT)| +--------------------------+ | 7000000000005| +--------------------------+ ``` Therefore, this PR just avoid to cast in the specific test. This is a temp fix. We need more robust way to avoid such cases. ## How was this patch tested? It passes with Maven in my local before/after this PR. I believe the problem seems similarly the Python or OS installed in the machine. I should test this against PR builder with `test-maven` for sure.. Closes apache#25128 from HyukjinKwon/SPARK-28270-2. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
… Arrow on JDK9+ ### What changes were proposed in this pull request? This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow. Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](apache/arrow#5078)). > #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". > This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty. ### Why are the changes needed? After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After apache#26133, JDK11 Jenkins job seem to fail. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/ ```scala Previous exception in task: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473) io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243) io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233) io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245) org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with JDK11. Closes apache#26552 from dongjoon-hyun/SPARK-ARROW-JDK11. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@BryanCutler @icexelloss started this patch, but is out for the holidays for a couple of weeks. If this is useful for starting a test suite for record batch conversion feel free to pull it into the integration branch.
This ideally needs ARROW-411 -- temporarily this adds arrow-tools as a dependency to get access to functions in the integration tester.
cc @leifwalsh