Test suite prototyping for collectAsArrow #10

wesm · 2016-12-12T21:58:18Z

@BryanCutler @icexelloss started this patch, but is out for the holidays for a couple of weeks. If this is useful for starting a test suite for record batch conversion feel free to pull it into the integration branch.

This ideally needs ARROW-411 -- temporarily this adds arrow-tools as a dependency to get access to functions in the integration tester.

cc @leifwalsh

BryanCutler · 2016-12-13T23:28:37Z

Great, I think this will help. I'll try it out
cc @yinxusen

BryanCutler · 2016-12-14T18:48:54Z

pom.xml

Can this be scoped to test?

Sure, this is only here temporarily pending ARROW-411. Feel free to cherry pick this commit and modify to suit

BryanCutler · 2016-12-14T21:08:39Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

I see this is done so the test can use the same root allocator, would it fail if the tests made a different instance? I'm just wondering about the broader usage in Spark, like should Spark manage a single root allocator or just create one each operation and allow the user to override like this?

wesm · 2016-12-14T23:27:18Z

I don't think it would make the tests fail. Might make sense to create a child allocator for each operation in Spark

wesm · 2016-12-15T18:23:53Z

@BryanCutler I just rebased this on arrow-integration

BryanCutler · 2016-12-15T22:40:19Z

Thanks @wesm, I'll merge this so we can start using the ArrowSuite too. I'll just change the scope on the dependency and comment out the lines that require modification to Arrow, so it doesn't break compilation.

Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10

BryanCutler · 2016-12-16T00:58:11Z

closed with 7127b32

Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10

…nput of UDF as double in the failed test in udf-aggregate_part1.sql ## What changes were proposed in this pull request? It still can be flaky on certain environments due to float limitation described at apache#25110 . See apache#25110 (comment) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/6584/testReport/org.apache.spark.sql/SQLQueryTestSuite/udf_pgSQL_udf_aggregates_part1_sql___Regular_Python_UDF/ ``` Expected "700000000000[6] 1", but got "700000000000[5] 1" Result did not match for query apache#33
SELECT CAST(avg(udf(CAST(x AS DOUBLE))) AS long), CAST(udf(var_pop(CAST(x AS DOUBLE))) AS decimal(10,3))
FROM (VALUES (7000000000005), (7000000000007)) v(x) ``` Here;s what's going on: apache#25110 (comment) ``` scala> Seq("7000000000004.999", "7000000000006.999").toDF().selectExpr("CAST(avg(value) AS long)").show() +--------------------------+ |CAST(avg(value) AS BIGINT)| +--------------------------+ | 7000000000005| +--------------------------+ ``` Therefore, this PR just avoid to cast in the specific test. This is a temp fix. We need more robust way to avoid such cases. ## How was this patch tested? It passes with Maven in my local before/after this PR. I believe the problem seems similarly the Python or OS installed in the machine. I should test this against PR builder with `test-maven` for sure.. Closes apache#25128 from HyukjinKwon/SPARK-28270-2. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

… Arrow on JDK9+ ### What changes were proposed in this pull request? This PR aims to add `io.netty.tryReflectionSetAccessible=true` to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow. Apache Arrow community documented this requirement at `0.15.0` ([ARROW-6206](apache/arrow#5078)). > #### For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". > This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty. ### Why are the changes needed? After ARROW-3191, Arrow Java library requires the property `io.netty.tryReflectionSetAccessible` to be set to true for JDK >= 9. After apache#26133, JDK11 Jenkins job seem to fail. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/676/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/677/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/678/ ```scala Previous exception in task: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
 io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473)
 io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
 io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
 io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
 org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222)
 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with JDK11. Closes apache#26552 from dongjoon-hyun/SPARK-ARROW-JDK11. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

wesm force-pushed the arrow-unit-test-proto branch from c12a3a6 to cfc578f Compare December 12, 2016 21:58

wesm mentioned this pull request Dec 12, 2016

Added more types for conversion #9

Merged

BryanCutler reviewed Dec 14, 2016

View reviewed changes

BryanCutler force-pushed the arrow-integration branch from 1c78926 to 1220e86 Compare December 14, 2016 22:43

Test suite prototyping for collectAsArrow

b2975a3

wesm force-pushed the arrow-unit-test-proto branch from cfc578f to b2975a3 Compare December 15, 2016 18:23

BryanCutler pushed a commit that referenced this pull request Dec 15, 2016

Test suite prototyping for collectAsArrow

7127b32

Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10

BryanCutler closed this Dec 16, 2016

wesm deleted the arrow-unit-test-proto branch December 22, 2016 18:52

BryanCutler pushed a commit that referenced this pull request Jan 24, 2017

Test suite prototyping for collectAsArrow

66f01da

Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10

BryanCutler pushed a commit that referenced this pull request Feb 23, 2017

Test suite prototyping for collectAsArrow

afd5739

Changed scope of arrow-tools dependency to test commented out lines to Integration.compareXX that are private to arrow closes #10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test suite prototyping for collectAsArrow #10

Test suite prototyping for collectAsArrow #10

Uh oh!

wesm commented Dec 12, 2016

Uh oh!

BryanCutler commented Dec 13, 2016

Uh oh!

BryanCutler Dec 14, 2016

Uh oh!

wesm Dec 14, 2016

Uh oh!

BryanCutler Dec 14, 2016

Uh oh!

wesm commented Dec 14, 2016

Uh oh!

wesm commented Dec 15, 2016

Uh oh!

BryanCutler commented Dec 15, 2016

Uh oh!

BryanCutler commented Dec 16, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Test suite prototyping for collectAsArrow #10

Test suite prototyping for collectAsArrow #10

Uh oh!

Conversation

wesm commented Dec 12, 2016

Uh oh!

BryanCutler commented Dec 13, 2016

Uh oh!

BryanCutler Dec 14, 2016

Choose a reason for hiding this comment

Uh oh!

wesm Dec 14, 2016

Choose a reason for hiding this comment

Uh oh!

BryanCutler Dec 14, 2016

Choose a reason for hiding this comment

Uh oh!

wesm commented Dec 14, 2016

Uh oh!

wesm commented Dec 15, 2016

Uh oh!

BryanCutler commented Dec 15, 2016

Uh oh!

BryanCutler commented Dec 16, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants