-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential performance improvement, maybe #83
Comments
It is plausible that this is a problem, it's plausible that the compiler might be smart enough to work-around this and it's even plausible that the compiler is not smart enough and it still might not be a problem. These state machine loops tend to be limited by how quickly one can shovel bytes into CPU registers, since DFAs don't tend to have great locality properties. I think the thing to do here is to scrutinize the generated codegen and see if the |
So I tried this, and... benchmark diffs are all over the place, and in enough places that I suspect it's just noise 😢 So I can't really tell if it makes a difference or not. One potential approach is something like https://github.com/bheisler/iai, which would give consistent results. I've been using the equivalent for some of my projects, and it's really nice to be able to tell if a microoptimization worked or not (presuming that the emulated CPU is actually the same as a real CPU, which of course it is not). |
@itamarst Yeah, in those cases, I tend to revert to ad hoc benchmarks using the iai definitely looks worth a try. I hadn't heard of it before. It will likely be a bit before I get around to that though. |
This commit nearly wipes the slate clean with respect to the Aho-Corasick implementations and rebuilds things from scratch. The main thing that motivated this was actually to make it possible for Aho-Corasick automata to support both unanchored and anchored searches, without having to build two distinct automata for that purpose. (And the reason why I wanted that was to support such a configuration from within regex-automata.) However, it quickly turned into something else entirely. This commit: * Splits the old NFA into two: a noncontiguous NFA that always uses a sparse representation and a contiguous NFA that uses both dense and sparse representations. The old NFA was always noncontiguous but also used dense and sparse representations for state transitions. This split makes it possible to better optimize the NFA. Namely, its contiguous nature has better cache effects and allows us to *substantially* reduce memory usage (sometimes by an order of magnitude) when compared to the old NFA. * Both of the NFAs now unconditionally support both anchored and unanchored searches with essentially no additional cost. * The DFA remains mostly the same as before, except its stride is now always a power of 2 and it can be made to support both unanchored and anchored searches by copying the transition table and tweaking it. (The copy is why only unanchored searches are supported by default.) * We borrow a stripped down version of the 'Input' abstraction from my work-in-progress on the 'regex-automata' crate. This reduces the combinatorial explosion of different search routines while making common cases ergonomic. For example, both 'anchored' and 'earliest' are now configuration options on an 'Input'. * The main AhoCorasick type now has fallible APIs in addition to infallible APIs. It is still reasonable to use infallible APIs, as they panic only based on incorrect combinations of build and search configurations. Both of which may be static properties of a program, and thus, an improper configuration is quite likely (but not necessarily) to be a programmer error. * The underlying Aho-Corasick implementations are now exposed through sub-modules, and an 'Automaton' trait unifies their APIs. This now makes it possible for callers to walk the automaton directly like any old finite state machine. * A no-std but with-alloc mode was added. This essentially just gets rid of the streaming APIs (that require std::io) and the std::error::Error trait impls. * The old (and awkward) 'auto_configure' API is now gone. Instead, automatic configuration is now the default. * The old 'premultiply' option is now gone, and is now always enabled. * The old 'byte_classes' option remains, but is enabled by default and disabling it no longer confers any performance benefit. Instead, it only provides a debuggability benefit by making state transitions easier to understand. * 'AhoCorasick' is no longer generic over the representation for its state identifiers. All Aho-Corasick automatons now unconditionally use 'u32' for this. This is in practice big enough for most use cases, but small enough to provide reasonable memory usage. The real star of this show is the contiguous NFA. It not only significantly reduces heap memory used, but is also quite a bit faster than the old NFA while being only slightly slower to build. It doesn't quite match the speed of a DFA, but it is quite close. These properties combined make it an excellently balanced choice. Indeed, 'AhoCorasick::new' will use it for almost all cases, excepting when the number of patterns is very small. Other than that, the central type of this crate remains 'AhoCorasick', and its most common use cases will work almost unchanged from 'aho-corasick 0.7'. Closes #57, Closes #83
At https://github.com/BurntSushi/aho-corasick/blob/master/src/classes.rs#L46, the
alphabet_len
is calculated each time the function is called. This function is called in (what appears to me to be) a hot inner function,next_state()
. It seems like this value doesn't change once theBytesClasses
are constructed, so there's no need to recalculate each time, it could be recalculated and cached only when theByteClasses
changes.(I imagine the compiler might be smart enough to figure this out? But then again maybe not.)
The text was updated successfully, but these errors were encountered: