-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++ #7275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@BryanCutler would you have time to review? |
|
@rymurr looks like this needs a rebase |
Thanks for the reminder @emkornfield, done. |
|
I'm a little swamped right now, but I'll try to review sometime this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to cast the argument of getLong to long, to avoid integer overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to cast the first argument, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can replace the old implementation
void setBit(ArrowBuf validityBuffer, int index)
with this one, as now ArrowBuf is based on 64-bit index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the link in setValidityBitToOne here https://github.com/apache/arrow/pull/7275/files#diff-bc5e2a2a1b6b348c39f57c91defb855bL79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to cast getLong argument to long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to cast (numRecords + 1) to long. Otherwise, integer overflow may happen before promoting the result to long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
4383a6d to
8ca2cb2
Compare
|
Is this good to merge now? @BryanCutler are you still planning to review this? Would like to get this in 1.0. |
I'm taking a look now, I'd like to get it in for 1.0 too. |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @rymurr ! Apologies for taking so long to review.. It looks pretty good, but I saw what looked like inconsistencies in the LargeListVector APIs using ints vs longs to me, and otherwise only minor things to fix up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice not to add a new template and combine with the UnionListWriter if possible. That doesn't have to be done here though, it can be looked at later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I fixed this...was just being lazy ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the link in setValidityBitToOne here https://github.com/apache/arrow/pull/7275/files#diff-bc5e2a2a1b6b348c39f57c91defb855bL79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the relative bit index, it's not possible to return long. Can you change return value to an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the comment above, I think this should still be an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO to revisit once 64 bit vectors are supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also indicate this in the javadoc here and probably for the class. It might even be a good idea to raise an error if the user tries to add too many elements, otherwise things might just start looking wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return long and not be casted to int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a check for Types.MinorType.LARGELIST here are remove the changes below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also won't need typeBitWidth as an arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
Thanks a lot for the thorough review. I have fixed up everything you mentioned. It appears some of the confusion was related to changes for 64-bit allocations that were recently merged and the rest was my ignorance! I have pushed a change with all your recommended fixes. |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick update @rymurr , it looks pretty good! Only a couple minor things. I see quite a few instances of offsetBuffer.getLong/setLong(i * OFFSET_WIDTH) that I believe need to be cast to long to avoid overflow. Could you take a quick pass and fix those up? I think we will be good to go after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this arg need to be cast to long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast to long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDE autoformatting...reverted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? the vector only supports integer max_value number of elements right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct. Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't value need to be passed in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value is always of type Void. None of the other visitors pass value to children.
Thanks for another thorough review @BryanCutler ! I think that I addressed everything. I may have went over the top w/ the casting but it doesn't hurt :-) |
Add large list and ensure it works with Integration tests. As noted in the JIRA ticket this is rather limited as the underlying vector doesn't support int64 addressing The important downcasts to int32 have been noted for a follow up once vectors with long addresses are supported
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
merged to master, thanks @rymurr ! |
…ation test with C++ Add large list and ensure it works with Integration tests. As noted in the JIRA ticket this is rather limited as the underlying vector doesn't support int64 addressing The important downcasts to int32 have been noted for a follow up once vectors with long addresses are supported Closes apache#7275 from rymurr/ARROW-6110 Authored-by: Ryan Murray <rymurr@dremio.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Add large list and ensure it works with Integration tests. As noted in the JIRA
ticket this is rather limited as the underlying vector doesn't support int64 addressing
The important downcasts to int32 have been noted for a follow up once vectors with
long addresses are supported