-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21583][SQL] Create a ColumnarBatch from ArrowColumnVectors #18787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0c39389
a4be6cf
f35b92c
43214b1
f906156
3d80e54
23d19df
cc81d48
4e2b081
9eb929a
a90a71b
3fcdec5
ffcbf75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,10 +25,13 @@ import scala.collection.JavaConverters._ | |
| import scala.collection.mutable | ||
| import scala.util.Random | ||
|
|
||
| import org.apache.arrow.vector.NullableIntVector | ||
|
|
||
| import org.apache.spark.SparkFunSuite | ||
| import org.apache.spark.memory.MemoryMode | ||
| import org.apache.spark.sql.{RandomDataGenerator, Row} | ||
| import org.apache.spark.sql.catalyst.InternalRow | ||
| import org.apache.spark.sql.execution.arrow.ArrowUtils | ||
| import org.apache.spark.sql.types._ | ||
| import org.apache.spark.unsafe.Platform | ||
| import org.apache.spark.unsafe.types.CalendarInterval | ||
|
|
@@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { | |
| s"vectorized reader")) | ||
| } | ||
| } | ||
|
|
||
| test("create columnar batch from Arrow column vectors") { | ||
| val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) | ||
| val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) | ||
| .createVector(allocator).asInstanceOf[NullableIntVector] | ||
| vector1.allocateNew() | ||
| val mutator1 = vector1.getMutator() | ||
| val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) | ||
| .createVector(allocator).asInstanceOf[NullableIntVector] | ||
| vector2.allocateNew() | ||
| val mutator2 = vector2.getMutator() | ||
|
|
||
| (0 until 10).foreach { i => | ||
| mutator1.setSafe(i, i) | ||
| mutator2.setSafe(i + 1, i) | ||
| } | ||
| mutator1.setNull(10) | ||
| mutator1.setValueCount(11) | ||
| mutator2.setNull(0) | ||
| mutator2.setValueCount(11) | ||
|
|
||
| val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) | ||
|
|
||
| val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) | ||
| val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) | ||
| batch.setNumRows(11) | ||
|
|
||
| assert(batch.numCols() == 2) | ||
| assert(batch.numRows() == 11) | ||
|
|
||
| val rowIter = batch.rowIterator().asScala | ||
| rowIter.zipWithIndex.foreach { case (row, i) => | ||
| if (i == 10) { | ||
| assert(row.isNullAt(0)) | ||
| } else { | ||
| assert(row.getInt(0) == i) | ||
| } | ||
| if (i == 0) { | ||
| assert(row.isNullAt(1)) | ||
| } else { | ||
| assert(row.getInt(1) == i - 1) | ||
| } | ||
| } | ||
|
|
||
| intercept[java.lang.AssertionError] { | ||
| batch.getRow(100) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, @BryanCutler and @ueshin .
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, that is strange. I'll take a look, thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! It seems to happen Maven only. sbt-hadoop-2.6 passed.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's probably because the assert is being compiled out.. This should probably not be in the test then.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the problem is that if the Java assertion is compiled out, then no error is produced and the test fails.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just made #19098 to remove this check - it's not really testing the functionality added here anyway but maybe another test should be added for checkout index out of bounds errors. |
||
| } | ||
|
|
||
| batch.close() | ||
| allocator.close() | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply put
Iterator.emptyhere.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextBatch()returns the row iterator, sorowIterneeds to be initialized here to the first row in the first batchThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I thought the first call of
hasNextwould initialize it.