feat(injector): add an `extend` method to Nucleo's injector #74

alexpasmantier · 2025-02-05T12:11:31Z

Description

This pull request adds an extend method to the Injector struct.

The main motivation I have for this comes from trying to optimize loading times for https://github.com/alexpasmantier/television which led me to take a look at Nucleo's implementation of boxcar.

The proposed extend method does the following for an incoming batch of values:

reserve all corresponding indexes at once (reducing contention on the inflight atomic)
compute start and end locations and allocate necessary buckets upfront
proceed to routing and inserting individual values to the relevant buckets

Benchmarks

I took the liberty of adding Criterion as a dev dependency in order to run a couple of benchmarks and assess if this was a meaningful feature to add or not.

cargo bench raw output

     Running benches/main.rs (target/release/deps/main-97d28d594921087e)
Gnuplot not found, using plotters backend
grow_boxcar/push/100    time:   [3.2966 µs 3.2992 µs 3.3018 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
grow_boxcar/extend/100  time:   [3.2370 µs 3.2387 µs 3.2410 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
grow_boxcar/push/1000   time:   [9.2759 µs 9.6346 µs 10.035 µs]
Found 30 outliers among 100 measurements (30.00%)
  19 (19.00%) low severe
  3 (3.00%) high mild
  8 (8.00%) high severe
grow_boxcar/extend/1000 time:   [6.7988 µs 6.8025 µs 6.8068 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
grow_boxcar/push/50000  time:   [268.11 µs 270.10 µs 272.19 µs]
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) high mild
  10 (10.00%) high severe
grow_boxcar/extend/50000
                        time:   [223.23 µs 227.82 µs 233.67 µs]
Found 21 outliers among 100 measurements (21.00%)
  9 (9.00%) high mild
  12 (12.00%) high severe
grow_boxcar/push/500000 time:   [4.2144 ms 4.2321 ms 4.2528 ms]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
grow_boxcar/extend/500000
                        time:   [2.0938 ms 2.0998 ms 2.1062 ms]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
grow_boxcar/push/5000000
                        time:   [47.687 ms 47.796 ms 47.911 ms]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
grow_boxcar/extend/5000000
                        time:   [42.718 ms 42.783 ms 42.855 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking grow_boxcar/push/20000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 21.3s, or reduce sample count to 20.
grow_boxcar/push/20000000
                        time:   [210.50 ms 211.12 ms 211.89 ms]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
Benchmarking grow_boxcar/extend/20000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.5s, or reduce sample count to 20.
grow_boxcar/extend/20000000
                        time:   [184.64 ms 185.70 ms 186.74 ms]

grow_boxcar_push_threaded/push/100
                        time:   [20.058 µs 20.115 µs 20.178 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe
grow_boxcar_push_threaded/extend/100
                        time:   [19.374 µs 19.420 µs 19.466 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
grow_boxcar_push_threaded/push/1000
                        time:   [74.038 µs 74.317 µs 74.594 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
grow_boxcar_push_threaded/extend/1000
                        time:   [37.466 µs 37.632 µs 37.789 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
grow_boxcar_push_threaded/push/50000
                        time:   [3.3598 ms 3.3668 ms 3.3734 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
grow_boxcar_push_threaded/extend/50000
                        time:   [276.76 µs 277.82 µs 278.91 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
grow_boxcar_push_threaded/push/500000
                        time:   [29.937 ms 30.243 ms 30.524 ms]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
grow_boxcar_push_threaded/extend/500000
                        time:   [3.9092 ms 3.9219 ms 3.9363 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
Benchmarking grow_boxcar_push_threaded/push/5000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.6s, or reduce sample count to 20.
grow_boxcar_push_threaded/push/5000000
                        time:   [193.18 ms 199.94 ms 206.54 ms]
Benchmarking grow_boxcar_push_threaded/extend/5000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, or reduce sample count to 80.
grow_boxcar_push_threaded/extend/5000000
                        time:   [55.980 ms 56.358 ms 56.751 ms]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking grow_boxcar_push_threaded/push/20000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 117.7s, or reduce sample count to 10.
grow_boxcar_push_threaded/push/20000000
                        time:   [919.62 ms 944.07 ms 971.58 ms]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking grow_boxcar_push_threaded/extend/20000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.1s, or reduce sample count to 20.
grow_boxcar_push_threaded/extend/20000000
                        time:   [207.17 ms 209.00 ms 211.02 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Observations

Sequential execution

The first benchmark compares, for different sizes of input:

sequentially pushing each value into the boxcar
extending the boxcar with all values at once

	100 lines	1000 lines	50_000 lines	500_000 lines	5_000_000 lines	20_000_000 lines
push	3.2992 µs	9.6346 µs	270.10 µs	4.2321 ms	47.796 ms	211.12 ms
extend	3.2387 µs	6.8025 µs	227.82 µs	2.0998 ms	42.783 ms	185.70 ms

While extend does look slightly faster than push for most input sizes, I was pretty skeptical at that point that the difference really justified the extra complexity.

The slight edge is I believe mostly explained by the fact that extend can pre-allocate all the buckets beforehand.

Adding values from multiple threads

The second benchmark compares, for different sizes of input:

N threads each pushing sequentially values from their own batch of values into the boxcar
N threads each extending the boxcar with their own batch of values

	100 lines	1000 lines	50_000 lines	500_000 lines	5_000_000 lines	20_000_000 lines
push	20.115 µs	74.317 µs	3.3668 ms	30.243 ms	199.94 ms	944.07 ms
extend	19.420 µs	37.632 µs	277.82 µs	3.9219 ms	56.358 ms	209.00 ms

In this case, the difference becomes quite significant across the entire range of input sizes, mostly - I believe, due to much less contention on atomics, and imho is a nice low hanging optimization for the library.

Curious to have some feedback on this.

Cheers

the-mikedavis

The performance difference looks promising!

I'm not super familiar with this code myself so I just have some minor/style comments.

src/boxcar.rs

src/lib.rs

Cargo.toml

Co-authored-by: Michael Davis <mcarsondavis@gmail.com>

alexpasmantier · 2025-02-10T12:20:10Z

Any thoughts on how to proceed with these changes?
Are there any requirements you feel aren't met yet and should be improved on?

Should we wait for input from @pascalkuthe?

(same thing for #75)

the-mikedavis · 2025-02-10T20:30:30Z

I'm not that familiar with this code but I think this looks good. @pascalkuthe should have a look as well. He's a bit busy at the moment with work so it might take him a while to find some time to look at this (and #75).

Unrelated: also consider upstreaming both of these changes to https://github.com/ibraheemdev/boxcar - if I read the history correctly this module is vendored from that crate and it could be nice to share these improvements with the users of that crate too. (That crate looks to have quite a few dependents looking at download info so these changes could be quite impactful :)

pascalkuthe · 2025-02-10T20:48:57Z

src/boxcar.rs

+        let end_location = Location::of(start_index + count - 1);
+
+        // Allocate necessary buckets upfront
+        if start_location.bucket != end_location.bucket {


this is a pessimisation. This is only supposed to be used for avoiding contention on allocating a new shard. That is only needed for the end_bucket and the bucket after it. For the other buckets it's not needed as they will all be allocated contention free from within this function.

The correct logic would look like this:

let alloc_entry = end_location.bucket_len - (end_location.bucket_len >> 3); if end_location.entry >= alloc_entry && (start_location.bucket != end_location.bucket || start_location.entry <= alloc_entry) { if let Some(next_bucket) = self.buckets.get(end_location.bucket as usize + 1) { Vec::get_or_alloc(next_bucket, end_location.bucket_len << 1, self.columns); } } if start_location.bucket != end_location.bucket { let bucket_ptr = self.buckets.get_unckecked(end_location.bucket as usize); Vec::get_or_alloc(bucket_ptr, end_location.bucket_len, self.columns); }

we probably want to turn all_entry intoa function on Location since it's used in multiple places now

You're absolutely right.

In fact after scratching my head over it, I feel like the only bucket we really need to potentially pre-allocate is the one following the end location bucket, since the last one will in any case get allocated inside the loop below.

Which gives the following:

// Eagerly allocate the next bucket if the last entry is close to the end of its next bucket let alloc_entry = end_location.alloc_next_bucket_entry(); if end_location.entry >= alloc_entry && (start_location.bucket != end_location.bucket || start_location.entry <= alloc_entry) { // This might be the last bucket, hence the check if let Some(next_bucket) = self.buckets.get(end_location.bucket as usize + 1) { Vec::get_or_alloc(next_bucket, end_location.bucket_len << 1, self.columns); } }

Am I missing anything?

src/boxcar.rs

pascalkuthe · 2025-02-10T21:03:02Z

src/boxcar.rs

+            // if we are at the end of the bucket, move on to the next one
+            if location.entry == location.bucket_len - 1 {
+                // safety: `location.bucket + 1` is always in bounds
+                bucket = unsafe { self.buckets.get_unchecked((location.bucket + 1) as usize) };


I don't think this is true. end_location could be the last bucket (which would make this UB).

I think this check should be at the start of the function (and simply check wether the bucket changed compared to the previous location)

Ah right! Moved to the top of the loop and now checking:

// if we're starting to insert into a different bucket, allocate it beforehand if location.entry == 0 && i != 0 { // safety: `location.bucket` is always in bounds bucket = unsafe { self.buckets.get_unchecked(location.bucket as usize) }; ... }

which I feel is the simplest and most straightforward implementation (and always IB)

Thanks for the guidance!

pascalkuthe

Thanks

alexpasmantier · 2025-02-14T09:26:55Z

Thanks for the review and for the guidance!

Left an unused method in there which made clippy complain, had to do one last commit to get rid of it.

feat(injector): add an extend method to Nucleo's injector

538f9d3

the-mikedavis reviewed Feb 7, 2025

View reviewed changes

src/boxcar.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

Cargo.toml Outdated Show resolved Hide resolved

Update lib.rs

fce1d6d

Co-authored-by: Michael Davis <mcarsondavis@gmail.com>

alexpasmantier force-pushed the add-extend-method-to-injector branch 2 times, most recently from bca0298 to ba6c552 Compare February 7, 2025 17:48

alexpasmantier requested a review from the-mikedavis February 7, 2025 17:50

remove benches and helix-editor#73 patch

fb31691

alexpasmantier force-pushed the add-extend-method-to-injector branch from ba6c552 to fb31691 Compare February 7, 2025 18:25

the-mikedavis previously approved these changes Feb 10, 2025

View reviewed changes

pascalkuthe reviewed Feb 10, 2025

View reviewed changes

udpates following pascalkuthe's review

142fc92

alexpasmantier dismissed the-mikedavis’s stale review via 142fc92 February 11, 2025 20:58

alexpasmantier added 2 commits February 11, 2025 22:08

simplification

d21bc1d

adding tests

5e1dc8e

alexpasmantier requested a review from pascalkuthe February 13, 2025 11:56

pascalkuthe previously approved these changes Feb 14, 2025

View reviewed changes

remove unused method

5096007

alexpasmantier dismissed pascalkuthe’s stale review via 5096007 February 14, 2025 09:25

pascalkuthe merged commit c754da5 into helix-editor:master Feb 14, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(injector): add an `extend` method to Nucleo's injector #74

feat(injector): add an `extend` method to Nucleo's injector #74

alexpasmantier commented Feb 5, 2025

the-mikedavis left a comment

alexpasmantier commented Feb 10, 2025 •

edited

Loading

the-mikedavis commented Feb 10, 2025

pascalkuthe Feb 10, 2025

alexpasmantier Feb 11, 2025

pascalkuthe Feb 10, 2025

alexpasmantier Feb 11, 2025

alexpasmantier Feb 11, 2025

pascalkuthe left a comment

alexpasmantier commented Feb 14, 2025

feat(injector): add an extend method to Nucleo's injector #74

feat(injector): add an extend method to Nucleo's injector #74

Conversation

alexpasmantier commented Feb 5, 2025

Description

Benchmarks

Observations

Sequential execution

Adding values from multiple threads

the-mikedavis left a comment

Choose a reason for hiding this comment

alexpasmantier commented Feb 10, 2025 • edited Loading

the-mikedavis commented Feb 10, 2025

pascalkuthe Feb 10, 2025

Choose a reason for hiding this comment

alexpasmantier Feb 11, 2025

Choose a reason for hiding this comment

pascalkuthe Feb 10, 2025

Choose a reason for hiding this comment

alexpasmantier Feb 11, 2025

Choose a reason for hiding this comment

alexpasmantier Feb 11, 2025

Choose a reason for hiding this comment

pascalkuthe left a comment

Choose a reason for hiding this comment

alexpasmantier commented Feb 14, 2025

feat(injector): add an `extend` method to Nucleo's injector #74

feat(injector): add an `extend` method to Nucleo's injector #74

alexpasmantier commented Feb 10, 2025 •

edited

Loading