-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear string-tracking hash table when ByteView deduplication is enabled #6385
Clear string-tracking hash table when ByteView deduplication is enabled #6385
Conversation
ℹ️ This is my first PR to the Arrow repository as well as to any Apache repository. I've reviewed the CONTRIBUTING documents as well as I can after watching day 1 of RustConf, but if I've overlooked anything or need to make any adjustments to the implementation, commit message, etc. to align with the project standards, I'm very happy to do so. |
@@ -590,6 +593,20 @@ mod tests { | |||
assert_eq!(array.views().get(1), array.views().get(5)); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if a regression test isn't needed for a simple change like this, I'm fine with dropping it if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression tests are always appreciated and we try to make then required.
Thank you for doing this @shanesveller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me, thank you
FYI @XiangpengHao |
look good to me, nice catch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ Thanks @shanesveller @tustvold and @XiangpengHao
@@ -590,6 +593,20 @@ mod tests { | |||
assert_eq!(array.views().get(1), array.views().get(5)); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression tests are always appreciated and we try to make then required.
Thank you for doing this @shanesveller
Welcome and thank you |
Which issue does this PR close?
Closes #6384.
Rationale for this change
The panic observed in #6384 was within
StringViewBuilder::get_value
, because thestring_tracker
hash table was not reset as part offinish()
:arrow-rs/arrow-array/src/builder/generic_bytes_view_builder.rs
Lines 365 to 373 in e838e62
If the builder saw continued use, and
append_value
was called with a previously-observed value, the hash table would return an index to a now-emptied buffer after slicing into it here:arrow-rs/arrow-array/src/builder/generic_bytes_view_builder.rs
Lines 257 to 259 in e838e62
What changes are included in this PR?
Builders will now clear the
string_tracker
hash-table as part of a revisedfinish
definition.Are there any user-facing changes?
Not to my knowledge.