-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ArrowBytesViewMap and ArrowBytesViewSet #11421
Conversation
Please hold for review. Reusing the underlying buffer turns out to be a bad idea because it will prevent the buffers from being released. I think it's better to copy the unique strings to new buffers. |
Thanks @XiangpengHao I have proposed merging the current string-view branch into main here: #11402 However, if we did that, it would only have access to changes that were released in Some options:
🤔 |
I think either works, and we do need a branch that tracks the head of arrow-rs, as I expect a few more changes to arrow will be necessary for us to land string view in DataFusion |
My preference then is to make a new branch (mostly to keep the change delta to a minimum). Let me try and get that PR merged shortly and we can make a new |
Which issue does this PR close?
Please wait until apache/arrow-rs#6047 is merged
Part of #11418. We can merge to main, if the
string-view
is merged to main first.Rationale for this change
We need them to better support efficient string view aggregate.
What changes are included in this PR?
They are not much different from
ArrowBytesMap
, except that they are simpler, in terms of hash table entry layout.This PR only implements the type, but did not use it anywhere, since the PR is already quite large imo.
Are these changes tested?
Are there any user-facing changes?