Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop cloning the whole list of DType factor when we don't need to #651

Merged
merged 2 commits into from
Dec 27, 2024

Conversation

irevoire
Copy link
Contributor

@irevoire irevoire commented Nov 3, 2024

Hey,

Improves #525

I was profiling numbat once again, and this time, I noticed we had a lot of allocation around the list of types, which we don’t seem to update that much afterward.
image

This trace is a bit hard to read (and it’s reversed), but basically, I found out that we were spending around ~5% of the execution time cloning a vector contained in the DType::factors.
I suspected that most of these clones were not necessary, so I tried to remove them with an Arc<Vec<_>> and then clone the vec when we actually needed to own it.

As expected, numbat got faster by more than 4% (because I guess we also avoid a lot of drops and other stuff):

% hyperfine './numbat-master -e "1 + 1"' './numbat-with-type-rc -e "1 + 1"' --warmup 15 --min-runs 100
Benchmark 1: ./numbat-master -e "1 + 1"
  Time (mean ± σ):      42.8 ms ±   1.7 ms    [User: 31.0 ms, System: 10.7 ms]
  Range (min … max):    39.9 ms …  46.1 ms    100 runs
 
Benchmark 2: ./numbat-with-type-rc -e "1 + 1"
  Time (mean ± σ):      37.8 ms ±   1.2 ms    [User: 26.6 ms, System: 10.4 ms]
  Range (min … max):    36.0 ms …  41.2 ms    100 runs
 
Summary
  ./numbat-with-type-rc -e "1 + 1" ran
    1.13 ± 0.06 times faster than ./numbat-master -e "1 + 1"

We could merge this PR as-is, but I’m not super satisfied with it:

  • We may make the same mistake in the future without noticing
  • I think the same issue happens to most types, but this one was simply the most expensive one
  • Do we really need to use Arc everywhere? Maybe we should use ref + Cow directly (it'll become a lifetime hell, though)

What do you think? Should we try to push further in this direction or call it a day and get the ~13% straight away

@sharkdp
Copy link
Owner

sharkdp commented Nov 21, 2024

Thank you very much for looking into this! Another question we could ask is: where do all these clones come from? Can we maybe prevent them from happening in the first place instead of trying to make cloning cheaper?

@irevoire
Copy link
Contributor Author

It comes down to the fact that we clone the values everywhere.
Numbat is definitely not great at not cloning stuff we don't need to clone, but improving that seems very hard.
Maybe we should make the whole value an Arc?
That would be pretty logical since it's a memory zone that can be large and moves a lot.
But while this may improve the performances in a lot of places, I believe my optimization on the dtypes won't benefit that much from it because, from what I've seen there is a lot of time where we clone the complete list of dtype only to read it afterward or give it to another value without modifying it. In both cases, since we are recreating a new value, we still need to be able to clone the dtype list without cloning the whole value imo 🤔

@sharkdp sharkdp changed the title [startup performance] Stop cloning the whole list of DType factor when we don't need to Stop cloning the whole list of DType factor when we don't need to Dec 27, 2024
@sharkdp
Copy link
Owner

sharkdp commented Dec 27, 2024

Ok, let's just take the 13% win for now and revisit this when it comes up again. Thank you very much!

@sharkdp sharkdp merged commit 77decee into sharkdp:master Dec 27, 2024
15 checks passed
@irevoire irevoire deleted the stop-cloning-the-dtype-factor branch December 30, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants