Skip to content

Conversation

@rymurr
Copy link
Contributor

@rymurr rymurr commented May 26, 2020

Add large list and ensure it works with Integration tests. As noted in the JIRA
ticket this is rather limited as the underlying vector doesn't support int64 addressing

The important downcasts to int32 have been noted for a follow up once vectors with
long addresses are supported

@github-actions
Copy link

@emkornfield
Copy link
Contributor

@BryanCutler would you have time to review?

@emkornfield
Copy link
Contributor

@rymurr looks like this needs a rebase

@rymurr
Copy link
Contributor Author

rymurr commented Jun 2, 2020

@rymurr looks like this needs a rebase

Thanks for the reminder @emkornfield, done.

@BryanCutler
Copy link
Member

I'm a little swamped right now, but I'll try to review sometime this week

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to cast the argument of getLong to long, to avoid integer overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to cast the first argument, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace the old implementation

void setBit(ArrowBuf validityBuffer, int index)

with this one, as now ArrowBuf is based on 64-bit index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to cast getLong argument to long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to cast (numRecords + 1) to long. Otherwise, integer overflow may happen before promoting the result to long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rymurr rymurr force-pushed the ARROW-6110 branch 3 times, most recently from 4383a6d to 8ca2cb2 Compare June 22, 2020 15:11
@nealrichardson
Copy link
Member

Is this good to merge now? @BryanCutler are you still planning to review this? Would like to get this in 1.0.

@BryanCutler
Copy link
Member

Is this good to merge now? @BryanCutler are you still planning to review this? Would like to get this in 1.0.

I'm taking a look now, I'd like to get it in for 1.0 too.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @rymurr ! Apologies for taking so long to review.. It looks pretty good, but I saw what looked like inconsistencies in the LargeListVector APIs using ints vs longs to me, and otherwise only minor things to fix up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice not to add a new template and combine with the UnionListWriter if possible. That doesn't have to be done here though, it can be looked at later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I fixed this...was just being lazy ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the relative bit index, it's not possible to return long. Can you change return value to an int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comment above, I think this should still be an int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup! Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO to revisit once 64 bit vectors are supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also indicate this in the javadoc here and probably for the class. It might even be a good idea to raise an error if the user tries to add too many elements, otherwise things might just start looking wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return long and not be casted to int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a check for Types.MinorType.LARGELIST here are remove the changes below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also won't need typeBitWidth as an arg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

@rymurr
Copy link
Contributor Author

rymurr commented Jul 1, 2020

Thanks for working on this @rymurr ! Apologies for taking so long to review.. It looks pretty good, but I saw what looked like inconsistencies in the LargeListVector APIs using ints vs longs to me, and otherwise only minor things to fix up.

Thanks a lot for the thorough review. I have fixed up everything you mentioned. It appears some of the confusion was related to changes for 64-bit allocations that were recently merged and the rest was my ignorance!

I have pushed a change with all your recommended fixes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick update @rymurr , it looks pretty good! Only a couple minor things. I see quite a few instances of offsetBuffer.getLong/setLong(i * OFFSET_WIDTH) that I believe need to be cast to long to avoid overflow. Could you take a quick pass and fix those up? I think we will be good to go after that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this arg need to be cast to long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast to long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE autoformatting...reverted!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? the vector only supports integer max_value number of elements right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct. Removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't value need to be passed in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value is always of type Void. None of the other visitors pass value to children.

@rymurr
Copy link
Contributor Author

rymurr commented Jul 3, 2020

Thanks for the quick update @rymurr , it looks pretty good! Only a couple minor things. I see quite a few instances of offsetBuffer.getLong/setLong(i * OFFSET_WIDTH) that I believe need to be cast to long to avoid overflow. Could you take a quick pass and fix those up? I think we will be good to go after that.

Thanks for another thorough review @BryanCutler ! I think that I addressed everything. I may have went over the top w/ the casting but it doesn't hurt :-)

Add large list and ensure it works with Integration tests. As noted in the JIRA
ticket this is rather limited as the underlying vector doesn't support int64 addressing

The important downcasts to int32 have been noted for a follow up once vectors with
long addresses are supported
Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BryanCutler
Copy link
Member

merged to master, thanks @rymurr !

@rymurr rymurr deleted the ARROW-6110 branch July 6, 2020 08:32
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…ation test with C++

Add large list and ensure it works with Integration tests. As noted in the JIRA
ticket this is rather limited as the underlying vector doesn't support int64 addressing

The important downcasts to int32 have been noted for a follow up once vectors with
long addresses are supported

Closes apache#7275 from rymurr/ARROW-6110

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants