[startup performance] Stop cloning the whole list of DType factor when we don't need to #651
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey,
Improves #525
I was profiling numbat once again, and this time, I noticed we had a lot of allocation around the list of types, which we don’t seem to update that much afterward.
This trace is a bit hard to read (and it’s reversed), but basically, I found out that we were spending around ~5% of the execution time cloning a vector contained in the
DType::factors
.I suspected that most of these clones were not necessary, so I tried to remove them with an
Arc<Vec<_>>
and then clone the vec when we actually needed to own it.As expected, numbat got faster by more than 4% (because I guess we also avoid a lot of drops and other stuff):
We could merge this PR as-is, but I’m not super satisfied with it:
Arc
everywhere? Maybe we should use ref +Cow
directly (it'll become a lifetime hell, though)What do you think? Should we try to push further in this direction or call it a day and get the ~13% straight away