Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache compiled regexes with lazycell #186

Merged
merged 1 commit into from
Jul 5, 2018
Merged

Conversation

robinst
Copy link
Collaborator

@robinst robinst commented Jul 3, 2018

With that, MatchPattern does not have to be mutably borrowed to do matching anymore.

It removes one obstacle for allowing SyntaxSet to be used by multiple threads, while still keeping regex compilation lazy.

This is one of the prerequisites for #182. I've created the PR against master so that we can benchmark the impact more directly. But I can integrate it into #182 if you prefer to not put it into master just yet.

With that, MatchPattern does not have to be mutably borrowed to do
matching anymore.

It removes one obstacle for allowing SyntaxSet to be used by multiple
threads, while still keeping regex compilation lazy.
@robinst
Copy link
Collaborator Author

robinst commented Jul 3, 2018

I've run cargo bench against master and then against this branch. Did the whole thing twice and these are the results: https://gist.github.com/robinst/ec776e1b9d642c04467d8fd939bf60d4

As you can see, the results are not very reliable.

@robinst robinst mentioned this pull request Jul 3, 2018
9 tasks
Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks for doing this! Definitely a good step towards multithreading. I'm okay to merge this into master, the benchmarks don't show any noticeable regression, on the few benchmarks of files large enough that highlighting time matters.

} else {
None
let regex = match_pat.regex();
let matched = regex.search_with_param(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return a Cow out of the if statement to avoid the duplication of the search.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, I'll try that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, doesn't work because Regex is not Clone:

error[E0277]: the trait bound `onig::Regex: std::clone::Clone` is not satisfied
   --> src/parsing/parser.rs:405:14
    |
405 |             (Cow::Borrowed(match_pat.regex()), true)
    |              ^^^^^^^^^^^^^ the trait `std::clone::Clone` is not implemented for `onig::Regex`
    |
    = note: required because of the requirements on the impl of `std::borrow::ToOwned` for `onig::Regex`
    = note: required by `std::borrow::Cow::Borrowed`

The following works, but not sure if it's worth it?:

        let do_search = |regex: &Regex, regions: &mut Region| {
            regex.search_with_param(
                line,
                start,
                line.len(),
                SearchOptions::SEARCH_OPTION_NONE,
                Some(regions),
                MatchParam::default(),
            )
        };

        let (matched, can_cache) = if match_pat.has_captures && captures.is_some() {
            let &(ref region, ref s) = captures.unwrap();
            let regex = match_pat.regex_with_refs(region, s);
            (do_search(&regex, regions), false)
        } else {
            (do_search(match_pat.regex(), regions), true)
        };

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh darn. Apparently what we need is a ✨ supercow ✨ but it would be kind of dumb to add that library just for this instance.

That closure solution looks fine to me although I'm also fine with merging as is, don't have a strong preference between them, up to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol 😆. Ok, I'll merge it like this.

@robinst robinst merged commit 2a81b81 into master Jul 5, 2018
@robinst robinst deleted the regex-cache-lazycell branch July 5, 2018 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants