Stop cloning the whole list of DType factor when we don't need to #651

irevoire · 2024-11-03T22:56:51Z

Hey,

Improves #525

I was profiling numbat once again, and this time, I noticed we had a lot of allocation around the list of types, which we don’t seem to update that much afterward.

This trace is a bit hard to read (and it’s reversed), but basically, I found out that we were spending around ~5% of the execution time cloning a vector contained in the DType::factors.
I suspected that most of these clones were not necessary, so I tried to remove them with an Arc<Vec<_>> and then clone the vec when we actually needed to own it.

As expected, numbat got faster by more than 4% (because I guess we also avoid a lot of drops and other stuff):

% hyperfine './numbat-master -e "1 + 1"' './numbat-with-type-rc -e "1 + 1"' --warmup 15 --min-runs 100
Benchmark 1: ./numbat-master -e "1 + 1"
  Time (mean ± σ):      42.8 ms ±   1.7 ms    [User: 31.0 ms, System: 10.7 ms]
  Range (min … max):    39.9 ms …  46.1 ms    100 runs
 
Benchmark 2: ./numbat-with-type-rc -e "1 + 1"
  Time (mean ± σ):      37.8 ms ±   1.2 ms    [User: 26.6 ms, System: 10.4 ms]
  Range (min … max):    36.0 ms …  41.2 ms    100 runs
 
Summary
  ./numbat-with-type-rc -e "1 + 1" ran
    1.13 ± 0.06 times faster than ./numbat-master -e "1 + 1"

We could merge this PR as-is, but I’m not super satisfied with it:

We may make the same mistake in the future without noticing
I think the same issue happens to most types, but this one was simply the most expensive one
Do we really need to use Arc everywhere? Maybe we should use ref + Cow directly (it'll become a lifetime hell, though)

What do you think? Should we try to push further in this direction or call it a day and get the ~13% straight away

sharkdp · 2024-11-21T20:39:56Z

Thank you very much for looking into this! Another question we could ask is: where do all these clones come from? Can we maybe prevent them from happening in the first place instead of trying to make cloning cheaper?

irevoire · 2024-11-27T02:08:49Z

It comes down to the fact that we clone the values everywhere.
Numbat is definitely not great at not cloning stuff we don't need to clone, but improving that seems very hard.
Maybe we should make the whole value an Arc?
That would be pretty logical since it's a memory zone that can be large and moves a lot.
But while this may improve the performances in a lot of places, I believe my optimization on the dtypes won't benefit that much from it because, from what I've seen there is a lot of time where we clone the complete list of dtype only to read it afterward or give it to another value without modifying it. In both cases, since we are recreating a new value, we still need to be able to clone the dtype list without cloning the whole value imo 🤔

sharkdp · 2024-12-27T20:37:14Z

Ok, let's just take the 13% win for now and revisit this when it comes up again. Thank you very much!

irevoire added 2 commits November 3, 2024 23:38

stop cloning the whole list of DType factor when we don't need to

fb94f27

fix the tests by making the DType Sync

fdadbc0

sharkdp changed the title ~~[startup performance] Stop cloning the whole list of DType factor when we don't need to~~ Stop cloning the whole list of DType factor when we don't need to Dec 27, 2024

sharkdp added the performance label Dec 27, 2024

sharkdp merged commit 77decee into sharkdp:master Dec 27, 2024
15 checks passed

irevoire deleted the stop-cloning-the-dtype-factor branch December 30, 2024 14:12

BrewTestBot mentioned this pull request Jan 1, 2025

numbat 1.15.0 Homebrew/homebrew-core#202920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop cloning the whole list of DType factor when we don't need to #651

Stop cloning the whole list of DType factor when we don't need to #651

irevoire commented Nov 3, 2024 •

edited

Loading

sharkdp commented Nov 21, 2024

irevoire commented Nov 27, 2024

sharkdp commented Dec 27, 2024

Stop cloning the whole list of DType factor when we don't need to #651

Stop cloning the whole list of DType factor when we don't need to #651

Conversation

irevoire commented Nov 3, 2024 • edited Loading

sharkdp commented Nov 21, 2024

irevoire commented Nov 27, 2024

sharkdp commented Dec 27, 2024

irevoire commented Nov 3, 2024 •

edited

Loading