-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Box ScalarValue:Lists, reduce size by half size #788
Conversation
// Since ScalarValues are used in a non trivial number of places, | ||
// making it larger means significant more memory consumption | ||
// per distinct value. | ||
assert_eq!(std::mem::size_of::<ScalarValue>(), 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the test showing the size decrease
List(Option<Vec<ScalarValue>>, DataType), | ||
/// list of nested ScalarValue (boxed to reduce size_of(ScalarValue)) | ||
#[allow(clippy::box_vec)] | ||
List(Option<Box<Vec<ScalarValue>>>, Box<DataType>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the change -- the rest of the PR is just follow on work from this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting optimization 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box<[ScalarValue]>
via Vec::into_boxed_slice
would also be an option and would remove one pointer indirection, with the downside that data would need to be copied if the vec has excess capacity. @alamb do you think this would be worth exploring? I could prepare a PR since I already started looking to the usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhorstmann I think using Vec::into_boxed_slice
would be just fine. I don't think ScalarValues
are often (ever?) updated after creation so using a boxed slice seems like a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I think keeping ScalarValue smaller will help in various places even if we choose to go with something other than the implementation in #786 |
Which issue does this PR close?
Re #786
Changes:
ScalarValue:List
s internal partsRationale for this change
ScalarValue
means switching to use it in hash aggregations and hash joins will not be as expensive memory wise (where one is instantiated for each distinct grouping value)What changes are included in this PR?
ScalarValue:List
s internal partsAre there any user-facing changes?
No