-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subscriber: use Vec instead of BTreeSet in DirectiveSet #580
Conversation
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
I ran some additional microbenchmarks that haven't merged to master yet on this branch. Here are the results (versus master): Benchmark results
The Also note that there is probably some additional noise that Criterion fails to compensate for, since the |
@@ -45,7 +38,7 @@ pub(crate) type Statics = DirectiveSet<StaticDirective>; | |||
|
|||
#[derive(Debug, PartialEq)] | |||
pub(crate) struct DirectiveSet<T> { | |||
directives: BTreeSet<T>, | |||
directives: Vec<T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we (eventually) want to introduce some kind of adaptive approach, where we use a vec up to a certain size, and switch to a set/map once we are larger than that size? Although, since we basically consume this data by iterating over it rather that via lookups, that probably doesn't make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, now that i think about it a bit, i think that if we want to go full niche data structures brain wizard, the ideal structure would be some kind of trie/prefix tree that's keyed by module path segments, so we don't have to iterate. but, even in that world, iteration is probably still faster if we have <10 directives.
anyway, just idle thoughts! this looks good for now!
new bench results on this branch (vs master):
the
is kinda weird; this is one that actually iterates into the directive set, so I'm surprised it has a regression, and unlike some of Criterion's perf diffing this seems potentially big enough to be significant...i'm rerunning it to rule out momentary noise. |
hmm, i reran the benchmarks and that test seems to be consistently taking 240~250ns:
(the "1% performance improvement" per criterion is, of course, nonsense) @samschlegel any thoughts? if there is a marked perf improvement in your "real life" testing on this branch, i am happy to move forward with it regardless... |
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) Signed-off-by: Eliza Weisman <eliza@buoyant.io>
### Changed - **filter**: `EnvFilter` directive selection now behaves correctly (i.e. like `env_logger`) (#583) ### Fixed - **filter**: Fixed `EnvFilter` incorrectly allowing less-specific filter directives to enable events that are disabled by more-specific filters (#583) - **filter**: Multiple significant `EnvFilter` performance improvements, especially when filtering events generated by `log` records (#578, #583) - **filter**: Replaced `BTreeMap` with `Vec` in `DirectiveSet`, improving iteration performance significantly with typical numbers of filter directives (#580) A big thank-you to @samschlegel for lots of help with `EnvFilter` performance tuning in this release! Signed-off-by: Eliza Weisman <eliza@buoyant.io>
## Motivation Recent changes to `tracing-subscriber` (#580 and #583) introduced some regressions in filter directive selection. In particular, directive selection appears to depend on the _order_ in which directives were specified in a env filter string, or on the order in which they were added using `add_directive`, rather than on specificity. This regression is due to the change that switched the storage of filter directives in `DirectiveSet`s from a `BTreeSet` to a `Vec`. Previously, the `DirectiveSet::add` and `DirectiveSet::extend` methods both relied on the inherent ordering of `BTreeSet`s. After changing to a `Vec`, the `DirectiveSet::add` method was changed to use a binary search to find the correct position for each directive, and use `Vec::insert` to add the directive at that position. This is correct behavior. However, the `Extend` (and therefore also `FromIterator`) implementations _did not use_ `add_directive` --- instead, they simply called `extend` on the underlying data structure. This was fine previously, when we could rely on the sorted nature of `BTreeSet`s, but now, it means that when a directive set is created from an iterator (such as when parsing a string with multiple filter directives!), the ordering of the directive set is based on the iterator's ordering, rather than sorted. We didn't catch this bug because all of our tests happen to put the least specific directive first. When the change to using a `Vec` broke all the existing tests, I was able to "fix" them simply by adding a `.rev()` call to the iterator, based on the incorrect assumption that we were always using the sorted insertion order, and that the test failures were simply due to the binary search inserting in the opposite order as `BTreeSet`. Adding the `.rev()` call caused issue #591, where a `DirectiveSet` built by calls to `add_directive` (which _does_ obey the correct sorting) was not selecting the right filters, since we were reversing the ordering and picking the least specific directive first. ## Solution I've changed the `DirectiveSet::extend` method to call `self.add` for each directive in the iterator. Now, they are inserted at the correct position. I've also removed the call to `.rev()` so that we iterate over the correctly sorted `DirectiveSet` in most-specific-first order. I've added new tests to reproduce both issue #591 and issue #623, and confirmed that they both fail prior to this change. Fixes #591 Fixes #623 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Quick glance at
perf
shows a lot of time being spent inbtree_set::Iter::next()
. Since we only ever iterate over it after it's built, we don't really need to pay this cost, so this switches to just building a Vec.