-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Box ScalarValue:Lists, reduce size by half size #788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Since ScalarValues are used in a non trivial number of places, | ||
| // making it larger means significant more memory consumption | ||
| // per distinct value. | ||
| assert_eq!(std::mem::size_of::<ScalarValue>(), 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the test showing the size decrease
| List(Option<Vec<ScalarValue>>, DataType), | ||
| /// list of nested ScalarValue (boxed to reduce size_of(ScalarValue)) | ||
| #[allow(clippy::box_vec)] | ||
| List(Option<Box<Vec<ScalarValue>>>, Box<DataType>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the change -- the rest of the PR is just follow on work from this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting optimization 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box<[ScalarValue]> via Vec::into_boxed_slice would also be an option and would remove one pointer indirection, with the downside that data would need to be copied if the vec has excess capacity. @alamb do you think this would be worth exploring? I could prepare a PR since I already started looking to the usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhorstmann I think using Vec::into_boxed_slice would be just fine. I don't think ScalarValues are often (ever?) updated after creation so using a boxed slice seems like a good idea
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
I think keeping ScalarValue smaller will help in various places even if we choose to go with something other than the implementation in #786 |
…gregates" feature (apache#788) * upgrade df version and disable skip partial agg * add comment * Save * Revert debug changes * add comment
Which issue does this PR close?
Re #786
Changes:
ScalarValue:Lists internal partsRationale for this change
ScalarValuemeans switching to use it in hash aggregations and hash joins will not be as expensive memory wise (where one is instantiated for each distinct grouping value)What changes are included in this PR?
ScalarValue:Lists internal partsAre there any user-facing changes?
No