-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41569: [Java] ListViewVector Implementation for UnionListViewReader #43077
Conversation
@github-actions crossbow submit -g java-jars |
|
@github-actions crossbow submit -g java |
Revision: 955e9b9 Submitted crossbow builds: ursacomputing/crossbow @ actions-9c744c4dc4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split up the C Data changes from the writer/reader changes?
java/vector/src/main/java/org/apache/arrow/vector/compare/VectorVisitor.java
Outdated
Show resolved
Hide resolved
@@ -257,70 +268,6 @@ public void write(${name}Holder holder) { | |||
public void writeNull() { | |||
} | |||
|
|||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these have to be moved out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct I have moved some methods unnecessarily. I moved them back. Now the split take cares of ListView related things in the view class and the rest as usual in the non view class.
this.writer = writer; | ||
} | ||
|
||
public PromotableViewWriter promote() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming this promote
is very confusing because it is really just converting to a ViewWriter instead of a PromotableWriter. Also, it's missing a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, toViewWriter()
would be better?
if (writer instanceof PromotableViewWriter) { | ||
// ensure writers are initialized | ||
((PromotableViewWriter) writer).getWriter(MinorType.LISTVIEW); | ||
} else { | ||
writer = ((PromotableWriter) writer).promote(); | ||
((PromotableViewWriter) writer).getWriter(MinorType.LISTVIEW); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the 'else' case come up in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to this call, there is already a PromotableWriter
which get created when we first start writing using the StructWriter
(it could be an intWriter, a floatWriter, etc). So the else
is the one get called when there is any vector type other than ListViewVector
is being written first. So in order to make sure we get the correct writer (UnionListViewWriter
in our case), we need to make sure this cast happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the beginning it was confusing to me how the original logic works, it contains only a single instance of PromotableWriter
and using that we do all the writing (even though there are multiple writer types being used via the StructWriter
) and since we needed a differentiation in the first place to make sure we can accomodate the ListView
types to be included into the UnionVector
maintained underneath this writers.
java/vector/src/main/java/org/apache/arrow/vector/complex/ListViewVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java
Outdated
Show resolved
Hide resolved
} else if (bufferType.equals(SIZE) | ||
&& vector.getValueCount() == 0 | ||
&& vector.getMinorType() == MinorType.LISTVIEW) { | ||
// Empty vectors may not have allocated a sizes buffer | ||
try (ArrowBuf vectorBufferTmp = vector.getAllocator().buffer(4)) { | ||
vectorBufferTmp.setInt(0, 0); | ||
writeValueToGenerator(bufferType, vectorBufferTmp, null, vector, i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be OK to have an empty sizes buffer? Why are we setting a single value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gathering my thoughts, I think the reason for doing this was basically to accommodate what was done to offset. There the current logic sets the a buffer with a single value. I think I naively followed the same approach. You have a point here, let me further test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not apply for list view. We shouldn't blindly apply the same hacks since that may cover up bugs.
Mmm we could, although the PR title is different I started towards the C data interface. And the required components have significant overlap to tests them properly. And I thought to add the C Data interface here. That code is very small compared to the rest. Would you prefer another PR for C data component? |
It's already hard to review and unless the C Data and JSON and etc changes are strictly required I'd prefer to separate them |
I will move the C Data components from this PR today itself. Sorry about that. |
@lidavidm I moved the C Data component and the Visitor components for a follow up PR. |
@lidavidm updated the PR. |
<@pp.dropOutputFile /> | ||
<@pp.changeOutputFile name="/org/apache/arrow/vector/complex/impl/UnionViewWriter.java" /> | ||
|
||
package org.apache.arrow.vector.complex.impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem fixed?
Also overall, can we get the imports grouped together at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm sorry about this confusion. I took a look at the previous templates, they all have these issues, apologies I have just adopted that without putting much thought.
Please check if it is resolved now?
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d4d92e4. There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 606 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
This PR contains the multiple components which are mainly required to add the C Data interface for
ListViewVector
. This PR solves the following major issues associated with this exercise.What changes are included in this PR?
ListViewVector
Implementation transferPair implementation #41269ListViewVector
Implementation copy implementation #41270Apart from that, the following features have also been added
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, we are introducing the usage of
listview
instead oflist
,startListView
instead ofstartList
andendListView
instead ofendList
forListView
related APIs in building theListViewVector
.UnionListViewReader
#41569