Skip to content

Commit

Permalink
impl: drop thread_local dependency
Browse files Browse the repository at this point in the history
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.

The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.

This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.

As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.

Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.

Fixes #362, Fixes #576

Ref BurntSushi/rure-go#3
  • Loading branch information
BurntSushi committed Mar 12, 2021
1 parent 5107293 commit e040c1b
Show file tree
Hide file tree
Showing 6 changed files with 392 additions and 128 deletions.
10 changes: 4 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,10 @@ use_std = ["std"]
# Enables all performance features.
perf = ["perf-cache", "perf-dfa", "perf-inline", "perf-literal"]
# Enables fast caching. (If disabled, caching is still used, but is slower.)
perf-cache = ["thread_local"]
# Currently, this feature has no effect. It used to remove the thread_local
# dependency and use a slower internal cache, but now the default cache has
# been improved and thread_local is no longer a dependency at all.
perf-cache = []
# Enables use of a lazy DFA when possible.
perf-dfa = []
# Enables aggressive use of inlining.
Expand Down Expand Up @@ -110,11 +113,6 @@ optional = true
version = "2.2.1"
optional = true

# For managing regex caches quickly across multiple threads.
[dependencies.thread_local]
version = "1"
optional = true

# For parsing regular expressions.
[dependencies.regex-syntax]
path = "regex-syntax"
Expand Down
102 changes: 0 additions & 102 deletions src/cache.rs

This file was deleted.

34 changes: 24 additions & 10 deletions src/exec.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::panic::AssertUnwindSafe;
use std::sync::Arc;

#[cfg(feature = "perf-literal")]
Expand All @@ -9,14 +10,14 @@ use syntax::hir::Hir;
use syntax::ParserBuilder;

use backtrack;
use cache::{Cached, CachedGuard};
use compile::Compiler;
#[cfg(feature = "perf-dfa")]
use dfa;
use error::Error;
use input::{ByteInput, CharInput};
use literal::LiteralSearcher;
use pikevm;
use pool::{Pool, PoolGuard};
use prog::Program;
use re_builder::RegexOptions;
use re_bytes;
Expand All @@ -34,8 +35,8 @@ use utf8::next_utf8;
pub struct Exec {
/// All read only state.
ro: Arc<ExecReadOnly>,
/// Caches for the various matching engines.
cache: Cached<ProgramCache>,
/// A pool of reusable values for the various matching engines.
pool: Pool<ProgramCache>,
}

/// `ExecNoSync` is like `Exec`, except it embeds a reference to a cache. This
Expand All @@ -46,7 +47,7 @@ pub struct ExecNoSync<'c> {
/// All read only state.
ro: &'c Arc<ExecReadOnly>,
/// Caches for the various matching engines.
cache: CachedGuard<'c, ProgramCache>,
cache: PoolGuard<'c, ProgramCache>,
}

/// `ExecNoSyncStr` is like `ExecNoSync`, but matches on &str instead of &[u8].
Expand Down Expand Up @@ -302,7 +303,8 @@ impl ExecBuilder {
ac: None,
match_type: MatchType::Nothing,
});
return Ok(Exec { ro: ro, cache: Cached::new() });
let pool = ExecReadOnly::new_pool(&ro);
return Ok(Exec { ro: ro, pool });
}
let parsed = self.parse()?;
let mut nfa = Compiler::new()
Expand Down Expand Up @@ -342,7 +344,8 @@ impl ExecBuilder {
ro.match_type = ro.choose_match_type(self.match_type);

let ro = Arc::new(ro);
Ok(Exec { ro: ro, cache: Cached::new() })
let pool = ExecReadOnly::new_pool(&ro);
Ok(Exec { ro, pool })
}

#[cfg(feature = "perf-literal")]
Expand Down Expand Up @@ -1254,10 +1257,9 @@ impl Exec {
/// Get a searcher that isn't Sync.
#[cfg_attr(feature = "perf-inline", inline(always))]
pub fn searcher(&self) -> ExecNoSync {
let create = || RefCell::new(ProgramCacheInner::new(&self.ro));
ExecNoSync {
ro: &self.ro, // a clone is too expensive here! (and not needed)
cache: self.cache.get_or(create),
cache: self.pool.get(),
}
}

Expand Down Expand Up @@ -1309,7 +1311,8 @@ impl Exec {

impl Clone for Exec {
fn clone(&self) -> Exec {
Exec { ro: self.ro.clone(), cache: Cached::new() }
let pool = ExecReadOnly::new_pool(&self.ro);
Exec { ro: self.ro.clone(), pool }
}
}

Expand Down Expand Up @@ -1442,6 +1445,13 @@ impl ExecReadOnly {
let lcs_len = self.suffixes.lcs().char_len();
lcs_len >= 3 && lcs_len > self.dfa.prefixes.lcp().char_len()
}

fn new_pool(ro: &Arc<ExecReadOnly>) -> Pool<ProgramCache> {
let ro = ro.clone();
Pool::new(Box::new(move || {
AssertUnwindSafe(RefCell::new(ProgramCacheInner::new(&ro)))
}))
}
}

#[derive(Clone, Copy, Debug)]
Expand Down Expand Up @@ -1500,7 +1510,11 @@ enum MatchNfaType {

/// `ProgramCache` maintains reusable allocations for each matching engine
/// available to a particular program.
pub type ProgramCache = RefCell<ProgramCacheInner>;
///
/// We declare this as unwind safe since it's a cache that's only used for
/// performance purposes. If a panic occurs, it is (or should be) always safe
/// to continue using the same regex object.
pub type ProgramCache = AssertUnwindSafe<RefCell<ProgramCacheInner>>;

#[derive(Debug)]
pub struct ProgramCacheInner {
Expand Down
14 changes: 6 additions & 8 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -523,11 +523,6 @@ All features below are enabled by default.
Enables all performance related features. This feature is enabled by default
and will always cover all features that improve performance, even if more
are added in the future.
* **perf-cache** -
Enables the use of very fast thread safe caching for internal match state.
When this is disabled, caching is still used, but with a slower and simpler
implementation. Disabling this drops the `thread_local` and `lazy_static`
dependencies.
* **perf-dfa** -
Enables the use of a lazy DFA for matching. The lazy DFA is used to compile
portions of a regex to a very fast DFA on an as-needed basis. This can
Expand All @@ -542,6 +537,11 @@ All features below are enabled by default.
Enables the use of literal optimizations for speeding up matches. In some
cases, literal optimizations can result in speedups of _several_ orders of
magnitude. Disabling this drops the `aho-corasick` and `memchr` dependencies.
* **perf-cache** -
This feature used to enable a faster internal cache at the cost of using
additional dependencies, but this is no longer an option. A fast internal
cache is now used unconditionally with no additional dependencies. This may
change in the future.
### Unicode features
Expand Down Expand Up @@ -631,8 +631,6 @@ extern crate memchr;
#[cfg_attr(feature = "perf-literal", macro_use)]
extern crate quickcheck;
extern crate regex_syntax as syntax;
#[cfg(feature = "perf-cache")]
extern crate thread_local;

// #[cfg(doctest)]
// doc_comment::doctest!("../README.md");
Expand Down Expand Up @@ -749,7 +747,6 @@ pub mod bytes {
}

mod backtrack;
mod cache;
mod compile;
#[cfg(feature = "perf-dfa")]
mod dfa;
Expand All @@ -764,6 +761,7 @@ mod literal;
#[cfg(feature = "pattern")]
mod pattern;
mod pikevm;
mod pool;
mod prog;
mod re_builder;
mod re_bytes;
Expand Down
Loading

0 comments on commit e040c1b

Please sign in to comment.