Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37538: [Go] Cheaper initial dictionary implementation #37634

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Sep 8, 2023

Rationale for this change

We've observed that starting with an initial dictionary can be unnecessarily expensive, so we explored ways to achieve the same (or similar) behavior.

What changes are included in this PR?

Prepending dictionaries instead of inserting them into the memotable.

Are these changes tested?

No this is just a sketch for discussion purposes.

Are there any user-facing changes?

n/a for the moment

@brancz brancz requested a review from zeroshade as a code owner September 8, 2023 15:39
@brancz
Copy link
Contributor Author

brancz commented Sep 8, 2023

cc @zeroshade

@brancz
Copy link
Contributor Author

brancz commented Sep 8, 2023

This doesn't work, it's just a sketch for discussion purposes.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

⚠️ GitHub issue apache/arrow-go#47 has been automatically assigned in GitHub to PR creator.

@zeroshade
Copy link
Member

@brancz can you throw together a sketched example as to when/why someone would use this (that could eventually be turned into a unit test)?

@brancz
Copy link
Contributor Author

brancz commented Sep 21, 2023

@zeroshade added one of my use cases for this. Note that in this simplistic example, I could have also built the indices entirely separately without a dictionary builder and just call new dictionary array with the indices and dictionary, however, when you have deeply nested lists and structs then you don't have this option as you can't control the list and struct offsets, so you rely on the builders that the record builder exposes to you, which in this case is the binary dictionary builder.

@zeroshade
Copy link
Member

Thanks @brancz I'll try to take a look at this over the weekend or early next week and see if it makes sense as an approach to pursue

@zeroshade
Copy link
Member

@brancz Sorry for the long delay here, but I can see the sense and idea behind your suggestion here. It also doesn't seem like it would be very problematic to allow so I'd be in favor of you fleshing this out to a viable PR that works (as opposed to just a sketch).

Copy link

⚠️ GitHub issue #37538 has no components, please add labels for components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go] Allow prepending dictionary
2 participants