Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11045: [Rust] Fix performance issues of allocator #9044

Closed
wants to merge 2 commits into from
Closed

ARROW-11045: [Rust] Fix performance issues of allocator #9044

wants to merge 2 commits into from

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Dec 30, 2020

This PR addresses a performance issue in how we allocate and reallocate the MutableBuffer.

Problem

See #9032

This PR

This PR changes MutableBuffer::reserve to call std::alloc::alloc instead of std::alloc::alloc_zeroed, which improves performance when building buffers with unknown sizes (such as strings and nested types).

This required changing some calls of MutableBuffer that assumed a zero initialized buffer even when reserve was used.
It also changed reserve's signature to reserve(additional) instead of reserve(new_len), which is the notation used throughout Rust's std library.

critcmp master-simd-e1b38cdaa4f2a7d35e2e576463e12b38875f29f3 alloc2-simd-88fc0ae819c24239ac9363fa462f9c6e1ddfd9fc -t 10
group                                 alloc2-simd-88fc0ae819c24239ac9363fa462f9c6e1ddfd9fc    master-simd-e1b38cdaa4f2a7d35e2e576463e12b38875f29f3
-----                                 ----------------------------------------------------    ----------------------------------------------------
add 512                               1.00   549.1±17.26ns        ? B/sec                     1.14  624.4±118.72ns        ? B/sec
buffer_bit_ops or                     1.00    369.3±7.51ns        ? B/sec                     1.16   427.1±20.25ns        ? B/sec
cast float32 to int32 512             1.00      3.3±0.09µs        ? B/sec                     1.21      4.0±0.09µs        ? B/sec
cast float64 to float32 512           1.00      3.0±0.06µs        ? B/sec                     1.24      3.7±0.11µs        ? B/sec
cast float64 to uint64 512            1.00      3.6±0.33µs        ? B/sec                     1.22      4.4±0.29µs        ? B/sec
cast int32 to float32 512             1.00      2.9±0.10µs        ? B/sec                     1.15      3.4±0.09µs        ? B/sec
cast int32 to float64 512             1.00      2.9±0.06µs        ? B/sec                     1.14      3.3±0.06µs        ? B/sec
cast int32 to uint32 512              1.00      4.0±0.09µs        ? B/sec                     1.23      4.9±0.12µs        ? B/sec
concat str 1024                       1.00      8.4±0.26µs        ? B/sec                     1.14      9.6±0.22µs        ? B/sec
equal_nulls_512                       1.00      3.3±0.07µs        ? B/sec                     1.22      4.0±0.10µs        ? B/sec
filter context u8 high selectivity    1.00      3.7±0.09µs        ? B/sec                     1.27      4.6±0.13µs        ? B/sec
filter u8 high selectivity            1.00     10.8±0.47µs        ? B/sec                     1.10     11.9±0.49µs        ? B/sec
like_utf8 scalar equals               1.00    149.9±6.36µs        ? B/sec                     1.12    167.3±3.45µs        ? B/sec
like_utf8 scalar starts with          1.00   338.2±12.26µs        ? B/sec                     1.15    388.4±7.62µs        ? B/sec
min string 512                        1.13      6.0±0.17µs        ? B/sec                     1.00      5.3±0.09µs        ? B/sec
nlike_utf8 scalar starts with         1.00   367.8±39.16µs        ? B/sec                     1.16    425.4±9.38µs        ? B/sec
subtract 512                          1.00   567.9±10.77ns        ? B/sec                     1.21  686.0±209.80ns        ? B/sec
sum 512                               1.32     67.8±0.59ns        ? B/sec                     1.00     51.3±0.70ns        ? B/sec
take str 1024                         1.19      6.1±0.10µs        ? B/sec                     1.00      5.1±0.03µs        ? B/sec
take str 512                          1.12      3.9±0.09µs        ? B/sec                     1.00      3.4±0.07µs        ? B/sec
take str null indices 1024            1.18      6.1±0.16µs        ? B/sec                     1.00      5.2±0.11µs        ? B/sec
take str null indices 512             1.12      3.9±0.09µs        ? B/sec                     1.00      3.5±0.03µs        ? B/sec
take str null values 1024             1.19      6.1±0.51µs        ? B/sec                     1.00      5.2±0.11µs        ? B/sec

The take is slower because it is using Buffer::from, which is always worse than using a MutableBuffer.

@apache apache deleted a comment from github-actions bot Jan 1, 2021
@jorgecarleitao
Copy link
Member Author

@nevi-me , there is a failing test in parquet that I am unable to fix. I think that we may be creating bitmap buffers that are too large.

My current hypothesis is that the buffer old_bitmap at consume_bitmap_buffer is not being resized in the same way the other buffer in consume_rep_levels is, which is causing it to be too large. Do you agree with this (before I proceed to patch it)?

@apache apache deleted a comment from github-actions bot Jan 2, 2021
@codecov-io
Copy link

Codecov Report

Merging #9044 (ff3efa0) into master (eb17687) will decrease coverage by 0.00%.
The diff coverage is 93.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9044      +/-   ##
==========================================
- Coverage   82.57%   82.56%   -0.01%     
==========================================
  Files         203      203              
  Lines       50036    49875     -161     
==========================================
- Hits        41316    41180     -136     
+ Misses       8720     8695      -25     
Impacted Files Coverage Δ
rust/arrow/src/array/array_list.rs 92.72% <ø> (-0.38%) ⬇️
rust/arrow/src/bytes.rs 53.12% <ø> (-5.21%) ⬇️
rust/arrow/src/compute/kernels/aggregate.rs 74.93% <ø> (-0.07%) ⬇️
rust/arrow/src/compute/kernels/comparison.rs 95.91% <ø> (ø)
rust/arrow/src/datatypes.rs 75.02% <ø> (ø)
rust/arrow/src/array/builder.rs 85.42% <86.84%> (+1.42%) ⬆️
rust/arrow/src/buffer.rs 97.38% <94.87%> (-0.84%) ⬇️
rust/arrow/src/array/array_primitive.rs 92.28% <100.00%> (-0.04%) ⬇️
rust/arrow/src/array/raw_pointer.rs 100.00% <100.00%> (ø)
rust/arrow/src/array/transform/list.rs 83.33% <100.00%> (-0.54%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb17687...ff3efa0. Read the comment docs.

if capacity > self.capacity {
let new_capacity = bit_util::round_upto_multiple_of_64(capacity);
let new_capacity = cmp::max(new_capacity, self.capacity * 2);
pub fn reserve(&mut self, additional: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to change the API in this PR? don't see how it could be related to the performance issue.
Also better to split it to make PR easier to review. We also need to update the doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes it more difficult. The signature is related to performance, though:

It was difficult to reason with reserve(new_len) when Vec::reserve and RawVec::reserve uses additional (and I was basing this PR on that). This PR fixes a bug in one Builder that was calling reserve(1) in append, when it should be calling reserve(self.len + 1). I also spent 4hs tracking a bug because of this difference (I was using reserve(additional) when its signature was reserve(new_len).

I can try to split that to another PR, though. IMO this PR will still need to be based on that PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think originally we referred C++ impl for the signature design to make it consistent. Since buffer module is public, we'll need to be careful when making any breaking change on API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yap, and I think it was a good decision back then, when the goal was to port C++ code to Rust. I understand your concern.

However, IMO we should not keep using C++-motivated APIs because it was historically like that. additional is the standard in virtually every container in Rust (HashMap, Vec, tokio::bytes::Bytes). IMO this change will need to be made at some point, and delaying it will only cause more pain in the future.

In this particular instance, it was motivated by an already existing confusion on our own code base, which IMO evidences that rust developers expect additional. Most people that use this API are probably passing additional to it already.

Regardless, we are bumping the major version, so there is already no expectation of backward compatibility as far as cargo and semver is concerned. My goal here is to make MutableBuffer as close as possible to Vec so that people can use them without even thinking that they are using a special allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, if we're going to bump the major version then I'm fine with the compatibility issue. I don't have strong opinion on this so either is fine to me. IMO a good documentation should be suffice for developers to understand what the method does.

@@ -774,16 +756,16 @@ impl MutableBuffer {
/// `new_len` will be zeroed out.
///
/// If `new_len` is less than `len`, the buffer will be truncated.
pub fn resize(&mut self, new_len: usize) {
pub fn resize(&mut self, new_len: usize, value: u8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have another method with value = 0 since this seems to be the dominate use case (it might enable future optimizations for the special case as well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based this signature on Vec::resize. I.e. I am assuming here that we use MutableBuffer as a special Vec<u8> that uses our custom allocator, and offer users an API that they are used to (Vec).

The main use-case for value: 255u8 is when we want to allocate and grow a set bitmap to later unset bits. We did not offer that before, which is the reason MutableArrayData does not do that (even though is most likely more efficient for many use-cases).

Note that even though rust std::RawVec offers a with_capacity_zeroed, std::Vec does not expose that as a public API: resize and resize_with are the public APIs to resize a vec, and none of them uses std::alloc::alloc_zeroed, i.e. they also just call std::ptr::write_bytes.

I agree with you that resize(len, 0) can potentially be optimized when the mutable buffer has zero capacity by calling std::alloc::alloc_zeroed instead of std::ptr::write_bytes.

jorgecarleitao added a commit that referenced this pull request Jan 19, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and #9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes #9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
@jorgecarleitao jorgecarleitao deleted the alloc2 branch January 19, 2021 17:46
@Dandandan
Copy link
Contributor

@jorgecarleitao I think this PR broke master:

   --> datafusion/src/physical_plan/parquet.rs:712:25
    |
712 |             data_buffer.resize(data_buffer.len() + data_size);
    |                         ^^^^^^ ----------------------------- supplied 1 argument
    |                         |
    |                         expected 2 arguments
    |
note: associated function defined here
   --> rust/arrow/src/buffer.rs:833:12
    |
833 |     pub fn resize(&mut self, new_len: usize, value: u8) {

@jorgecarleitao
Copy link
Member Author

likely. It was kind of expected, as it did some backward incompatible changes. I was trying to merge it first to avoid breaking, but I guess I was not fast for the speed on which PRs are merged into master after the green light on the mailing list :P

kszucs pushed a commit that referenced this pull request Jan 25, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and #9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes #9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and apache#9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes apache#9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and apache#9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes apache#9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants