subscriber: use Vec instead of BTreeSet in DirectiveSet #580

samschlegel · 2020-02-12T19:43:24Z

Quick glance at perf shows a lot of time being spent in btree_set::Iter::next(). Since we only ever iterate over it after it's built, we don't really need to pay this cost, so this switches to just building a Vec.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

tracing-subscriber/src/registry/mod.rs

hawkw · 2020-02-12T20:07:55Z

I ran some additional microbenchmarks that haven't merged to master yet on this branch. Here are the results (versus master):

Benchmark results

   Compiling tracing-subscriber v0.2.0 (/Users/eliza/Code/tracing/tracing-subscriber)
    Finished bench [optimized] target(s) in 13.65s
     Running target/release/deps/filter-56ce39a1aef6b2a0
static/baseline_single_threaded
                        time:   [91.384 ns 91.869 ns 92.429 ns]
                        change: [-10.968% -7.6276% -4.7931%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
static/single_threaded  time:   [50.591 ns 50.723 ns 50.886 ns]
                        change: [-1.7266% -0.7931% +0.1369%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
static/enabled_one      time:   [22.726 ns 22.812 ns 22.905 ns]
                        change: [-1.0189% +0.0390% +1.0734%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
static/enabled_many     time:   [22.959 ns 23.022 ns 23.101 ns]
                        change: [+1.3879% +2.7749% +4.0623%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
static/disabled_level_one
                        time:   [2.5621 ns 2.5729 ns 2.5858 ns]
                        change: [-14.054% -13.229% -12.290%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
static/disabled_level_many
                        time:   [2.6408 ns 2.6498 ns 2.6608 ns]
                        change: [-2.8512% -1.9669% -1.0625%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
static/disabled_one     time:   [2.5953 ns 2.6016 ns 2.6092 ns]
                        change: [-11.476% -10.435% -9.4047%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
static/disabled_many    time:   [2.5950 ns 2.6052 ns 2.6207 ns]
                        change: [-10.402% -9.5026% -8.4719%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
static/baseline_multithreaded
                        time:   [10.206 us 10.392 us 10.608 us]
                        change: [-6.7562% -2.6270% +1.4671%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe
static/multithreaded    time:   [10.598 us 10.868 us 11.159 us]
                        change: [+2.7360% +6.8107% +11.079%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

dynamic/baseline_single_threaded
                        time:   [174.87 ns 175.97 ns 177.26 ns]
                        change: [+0.2769% +1.4296% +2.7593%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
dynamic/single_threaded time:   [556.82 ns 559.14 ns 561.66 ns]
                        change: [-4.8935% -3.8468% -2.8140%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
dynamic/baseline_multithreaded
                        time:   [10.708 us 10.918 us 11.160 us]
                        change: [-6.9469% -3.7037% -0.4522%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
dynamic/multithreaded   time:   [14.119 us 14.341 us 14.597 us]
                        change: [-4.9532% -1.8344% +1.3532%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

mixed/disabled          time:   [46.207 ns 46.391 ns 46.613 ns]
                        change: [-20.225% -19.330% -18.427%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
mixed/disabled_by_level time:   [43.776 ns 43.873 ns 43.986 ns]
                        change: [-21.527% -20.664% -19.806%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

The mixed/* and {static,dynamic}/*_many benchmarks are the most likely to exercise this change, as they involve iterating over a set of multiple directives.

Also note that there is probably some additional noise that Criterion fails to compensate for, since the baseline_single_threaded benchmark (which doesn't involve a filter at all and thus shouldn't exercise this change) observes a -7% "improvement" vs. master.

hawkw · 2020-02-12T21:56:58Z

tracing-subscriber/src/filter/env/directive.rs

@@ -45,7 +38,7 @@ pub(crate) type Statics = DirectiveSet<StaticDirective>;

 #[derive(Debug, PartialEq)]
 pub(crate) struct DirectiveSet<T> {
- directives: BTreeSet<T>,
+ directives: Vec<T>,


I wonder if we (eventually) want to introduce some kind of adaptive approach, where we use a vec up to a certain size, and switch to a set/map once we are larger than that size? Although, since we basically consume this data by iterating over it rather that via lookups, that probably doesn't make a difference.

actually, now that i think about it a bit, i think that if we want to go full niche data structures brain wizard, the ideal structure would be some kind of trie/prefix tree that's keyed by module path segments, so we don't have to iterate. but, even in that world, iteration is probably still faster if we have <10 directives.

anyway, just idle thoughts! this looks good for now!

hawkw · 2020-02-13T00:08:54Z

new bench results on this branch (vs master):

    Finished bench [optimized] target(s) in 12.40s
     Running target/release/deps/filter-b804f035b8022c0d
static/baseline_single_threaded
                        time:   [110.90 ns 110.97 ns 111.08 ns]
                        change: [+1.2294% +1.4973% +1.6691%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe
static/single_threaded  time:   [63.846 ns 63.869 ns 63.899 ns]
                        change: [-0.5777% -0.1758% +0.1042%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe
static/enabled_one      time:   [28.303 ns 28.319 ns 28.337 ns]
                        change: [-0.0520% +0.0592% +0.2128%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  12 (12.00%) high severe
static/enabled_many     time:   [28.315 ns 28.331 ns 28.353 ns]
                        change: [+0.1083% +0.2833% +0.5763%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
static/disabled_level_one
                        time:   [4.1052 ns 4.1082 ns 4.1124 ns]
                        change: [+12.331% +12.442% +12.558%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe
static/disabled_level_many
                        time:   [3.6492 ns 3.6510 ns 3.6534 ns]
                        change: [-11.189% -11.098% -10.992%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  11 (11.00%) high severe
static/disabled_one     time:   [3.6502 ns 3.6519 ns 3.6540 ns]
                        change: [-11.796% -11.384% -11.079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
static/disabled_many    time:   [3.6492 ns 3.6529 ns 3.6578 ns]
                        change: [-11.302% -11.203% -11.108%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  10 (10.00%) high severe
static/baseline_multithreaded
                        time:   [7.4656 us 7.6264 us 7.8015 us]
                        change: [-1.6940% +1.2931% +4.2372%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
static/multithreaded    time:   [7.5107 us 7.6612 us 7.8391 us]
                        change: [-5.2509% -1.6186% +1.9221%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

dynamic/baseline_single_threaded
                        time:   [245.47 ns 245.71 ns 246.02 ns]
                        change: [-0.9006% -0.7823% -0.6443%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  15 (15.00%) high mild
  4 (4.00%) high severe
dynamic/single_threaded time:   [1.1450 us 1.1460 us 1.1473 us]
                        change: [-5.6606% -5.5597% -5.4597%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
dynamic/baseline_multithreaded
                        time:   [7.5787 us 7.7494 us 7.9323 us]
                        change: [-0.7284% +2.5484% +6.0102%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
dynamic/multithreaded   time:   [7.9655 us 8.1806 us 8.4514 us]
                        change: [-6.1092% -1.9367% +2.4388%] (p = 0.37 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

mixed/disabled          time:   [64.513 ns 64.665 ns 64.932 ns]
                        change: [-20.089% -19.884% -19.613%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
mixed/disabled_by_level time:   [55.575 ns 55.615 ns 55.672 ns]
                        change: [-18.548% -18.464% -18.381%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high severe

     Running target/release/deps/filter_log-b8cedd42cccfa727
log/static/baseline_single_threaded
                        time:   [503.47 ns 503.63 ns 503.83 ns]
                        change: [+4.7558% +5.1753% +5.4120%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  7 (7.00%) high severe
log/static/single_threaded
                        time:   [521.44 ns 521.68 ns 521.97 ns]
                        change: [-5.8814% -5.7675% -5.6473%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe
log/static/enabled_one  time:   [161.63 ns 161.76 ns 161.92 ns]
                        change: [-7.0656% -6.4580% -6.0825%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe
log/static/enabled_many time:   [248.19 ns 248.37 ns 248.57 ns]
                        change: [+43.525% +43.812% +44.032%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
log/static/disabled_level_one
                        time:   [73.843 ns 73.953 ns 74.074 ns]
                        change: [-6.2330% -6.0480% -5.8570%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
log/static/disabled_level_many
                        time:   [114.86 ns 115.04 ns 115.31 ns]
                        change: [-10.854% -10.445% -9.6870%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
log/static/disabled_one time:   [72.748 ns 72.986 ns 73.227 ns]
                        change: [-6.0441% -5.7632% -5.4852%] (p = 0.00 < 0.05)
                        Performance has improved.
log/static/disabled_many
                        time:   [112.39 ns 112.78 ns 113.39 ns]
                        change: [-9.2074% -8.9880% -8.7034%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe
log/static/baseline_multithreaded
                        time:   [7.4648 us 7.6124 us 7.7669 us]
                        change: [-2.7126% +0.8626% +4.7997%] (p = 0.64 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
log/static/multithreaded
                        time:   [7.5119 us 7.6573 us 7.8040 us]
                        change: [-3.5670% -0.4202% +2.7499%] (p = 0.80 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

log/dynamic/baseline_single_threaded
                        time:   [619.46 ns 621.20 ns 624.50 ns]
                        change: [+0.4148% +0.6241% +0.9236%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe
log/dynamic/single_threaded
                        time:   [1.4144 us 1.4211 us 1.4322 us]
                        change: [-2.3086% -1.7723% -1.0934%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
log/dynamic/baseline_multithreaded
                        time:   [7.4567 us 7.5690 us 7.6898 us]
                        change: [-2.8989% +1.5183% +6.2860%] (p = 0.52 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
log/dynamic/multithreaded
                        time:   [7.8385 us 8.0471 us 8.2835 us]
                        change: [-7.5094% -3.6507% +0.3530%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

log/mixed/disabled      time:   [94.747 ns 94.857 ns 94.978 ns]
                        change: [-9.8083% -9.6527% -9.5001%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
log/mixed/disabled_by_level
                        time:   [82.058 ns 82.127 ns 82.210 ns]
                        change: [-11.542% -10.934% -10.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

the

log/static/enabled_many time:   [248.19 ns 248.37 ns 248.57 ns]
                        change: [+43.525% +43.812% +44.032%] (p = 0.00 < 0.05)
                        Performance has regressed.

is kinda weird; this is one that actually iterates into the directive set, so I'm surprised it has a regression, and unlike some of Criterion's perf diffing this seems potentially big enough to be significant...i'm rerunning it to rule out momentary noise.

hawkw · 2020-02-13T00:12:02Z

hmm, i reran the benchmarks and that test seems to be consistently taking 240~250ns:

log/static/enabled_many time:   [245.07 ns 245.20 ns 245.37 ns]
                        change: [-1.3968% -1.2895% -1.1841%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  5 (5.00%) high severe

(the "1% performance improvement" per criterion is, of course, nonsense)

@samschlegel any thoughts? if there is a marked perf improvement in your "real life" testing on this branch, i am happy to move forward with it regardless...

Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>

@samschlegel

### Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) ### Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) A big thank-you to @samschlegel for lots of help with `EnvFilter` performance tuning in this release! Signed-off-by: Eliza Weisman <eliza@buoyant.io>

## Motivation Recent changes to `tracing-subscriber` (#580 and #583) introduced some regressions in filter directive selection. In particular, directive selection appears to depend on the _order_ in which directives were specified in a env filter string, or on the order in which they were added using `add_directive`, rather than on specificity. This regression is due to the change that switched the storage of filter directives in `DirectiveSet`s from a `BTreeSet` to a `Vec`. Previously, the `DirectiveSet::add` and `DirectiveSet::extend` methods both relied on the inherent ordering of `BTreeSet`s. After changing to a `Vec`, the `DirectiveSet::add` method was changed to use a binary search to find the correct position for each directive, and use `Vec::insert` to add the directive at that position. This is correct behavior. However, the `Extend` (and therefore also `FromIterator`) implementations _did not use_ `add_directive` --- instead, they simply called `extend` on the underlying data structure. This was fine previously, when we could rely on the sorted nature of `BTreeSet`s, but now, it means that when a directive set is created from an iterator (such as when parsing a string with multiple filter directives!), the ordering of the directive set is based on the iterator's ordering, rather than sorted. We didn't catch this bug because all of our tests happen to put the least specific directive first. When the change to using a `Vec` broke all the existing tests, I was able to "fix" them simply by adding a `.rev()` call to the iterator, based on the incorrect assumption that we were always using the sorted insertion order, and that the test failures were simply due to the binary search inserting in the opposite order as `BTreeSet`. Adding the `.rev()` call caused issue #591, where a `DirectiveSet` built by calls to `add_directive` (which _does_ obey the correct sorting) was not selecting the right filters, since we were reversing the ordering and picking the least specific directive first. ## Solution I've changed the `DirectiveSet::extend` method to call `self.add` for each directive in the iterator. Now, they are inserted at the correct position. I've also removed the call to `.rev()` so that we iterate over the correctly sorted `DirectiveSet` in most-specific-first order. I've added new tests to reproduce both issue #591 and issue #623, and confirmed that they both fail prior to this change. Fixes #591 Fixes #623 Signed-off-by: Eliza Weisman <eliza@buoyant.io>

hawkw and others added 2 commits February 12, 2020 19:39

kill btreeset

14ac8c4

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

Use binary_search

5f98ca4

samschlegel commented Feb 12, 2020

View reviewed changes

tracing-subscriber/src/registry/mod.rs Outdated Show resolved Hide resolved

samschlegel requested a review from hawkw February 12, 2020 19:45

undo accidental change

64e275d

hawkw approved these changes Feb 12, 2020

View reviewed changes

hawkw changed the title ~~Use Vec instead of BTreeSet in DirectiveSet~~ subscriber: use Vec instead of BTreeSet in DirectiveSet Feb 13, 2020

hawkw merged commit 1e35ce2 into tokio-rs:master Feb 13, 2020

hawkw mentioned this pull request Feb 14, 2020

subscriber: prepare to release 0.2.1 #586

Merged

hawkw mentioned this pull request Mar 5, 2020

subscriber: fix filter selection regressions #624

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subscriber: use Vec instead of BTreeSet in DirectiveSet #580

subscriber: use Vec instead of BTreeSet in DirectiveSet #580

samschlegel commented Feb 12, 2020

hawkw commented Feb 12, 2020

hawkw Feb 12, 2020

hawkw Feb 12, 2020

hawkw commented Feb 13, 2020

hawkw commented Feb 13, 2020

subscriber: use Vec instead of BTreeSet in DirectiveSet #580

subscriber: use Vec instead of BTreeSet in DirectiveSet #580

Conversation

samschlegel commented Feb 12, 2020

hawkw commented Feb 12, 2020

hawkw Feb 12, 2020

Choose a reason for hiding this comment

hawkw Feb 12, 2020

Choose a reason for hiding this comment

hawkw commented Feb 13, 2020

hawkw commented Feb 13, 2020