Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move regex use to module: Rebase #283

Conversation

gilescope
Copy link

Attempt to bring pr 6 up to date with master.

robinst and others added 25 commits November 25, 2019 23:10
That way we can add fancy-regex support behind a feature.
* Adds a std::error::Error impl for Error
* Adds a backtracking limit to mitigate catastrophic backtracking
Without this, some parsing benchmarks took 30% longer to run.
Some of the regexes include `$` and expect it to match end of line. In
fancy-regex, `$` means end of text by default. Adding `(?m)` activates
multi-line mode which changes `$` to match end of line.

This fixes a large number of the failed assertions with syntest.
In fancy-regex, POSIX character classes only match ASCII characters.
Sublime's syntaxes expect them to match Unicode characters as well, so
transform them to corresponding Unicode character classes.
With the regex crate and fancy-regex, `^` in multi-line mode also
matches at the end of a string like "test\n". There are some regexes in
the syntax definitions like `^\s*$`, which are intended to match a blank
line only. So change `^` to `\A` which only matches at the beginning of
text.
Note that this wasn't a problem with Oniguruma because it works on UTF-8
bytes, but fancy-regex works on characters.
Always adding `(?m)` for the entire regex meant that `.` also changed meaning,
which is not what we want. The safer option is to use `(?m:$)` for `$` only.

That also means we don't have to bother with `\A`. But we do need to parse
look-behinds because we can't use `(?m:$)` in it.
Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was
using before.
That way we can add fancy-regex support behind a feature.
* Adds a std::error::Error impl for Error
* Adds a backtracking limit to mitigate catastrophic backtracking
Without this, some parsing benchmarks took 30% longer to run.
Some of the regexes include `$` and expect it to match end of line. In
fancy-regex, `$` means end of text by default. Adding `(?m)` activates
multi-line mode which changes `$` to match end of line.

This fixes a large number of the failed assertions with syntest.
In fancy-regex, POSIX character classes only match ASCII characters.
Sublime's syntaxes expect them to match Unicode characters as well, so
transform them to corresponding Unicode character classes.
With the regex crate and fancy-regex, `^` in multi-line mode also
matches at the end of a string like "test\n". There are some regexes in
the syntax definitions like `^\s*$`, which are intended to match a blank
line only. So change `^` to `\A` which only matches at the beginning of
text.
Note that this wasn't a problem with Oniguruma because it works on UTF-8
bytes, but fancy-regex works on characters.
Always adding `(?m)` for the entire regex meant that `.` also changed meaning,
which is not what we want. The safer option is to use `(?m:$)` for `$` only.

That also means we don't have to bother with `\A`. But we do need to parse
look-behinds because we can't use `(?m:$)` in it.
Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was
using before.
@gilescope
Copy link
Author

Maybe to solve the no defaults we always include fancy and rely on the compiler to throw away the code if people choose onig? Either that or force people to pick a feature when they have no default features selected.

@gilescope
Copy link
Author

@trishume thoughts?

@Adarma
Copy link

Adarma commented Mar 19, 2020

With this, I am still unable to build on Windows even without the onig feature enabled.
I created pull request #284 to make fancy the default on windows and not depend on onig at all.
onig remains default for other OSs with fancy as an optional feature. That change builds on windows by default.

@gilescope
Copy link
Author

Totally happy with #284. Sorry on a mac here so didn't test it on windows. Curious what the build failure was, but really any which way we get a pure rust impl works for me. @trishume does #284 work for you? (I'm keen on cargo-expand being pure rust as lots of people try and use it)

Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, great that you got it fully rebased with CI passing, nice work!

I'll give this a little bit of time to see if @robinst wants to weigh in on how to move forward with getting fancy-regex landed since he's been doing most of the fancy-regex stuff so far. Ping me if this just ends up sitting here by Sunday.

Cargo.toml Outdated
@@ -20,7 +20,7 @@ exclude = [
[dependencies]
yaml-rust = { version = "0.4", optional = true }
onig = { version = "5.0", optional = true }
fancy-regex = { version = "0.3.0", optional = true }
fancy-regex = { version = "0.3.0" }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing a catch here, but if we're going this way would it be possible to rejigger things so that fancy-regex is included by the parsing feature? This might require adding some cfgs to some modules to require the parsing feature to be built. It used to be the case that when no default features were included no regex engine was included, which was at least nice for Xi, which only uses the theme functionality.

A `null` implmentation for when --no-default-features.
@gilescope
Copy link
Author

Didn't realise it could work with no regex. Have introduced a NullObject implementation for when no regex feature is selected. Again happy with either PR :-) whatever gets us moving!

@Adarma
Copy link

Adarma commented Mar 20, 2020

The build failure I get is the same as #264

@robinst
Copy link
Collaborator

robinst commented Mar 20, 2020

So this is merging into my branch? I actually rebased myself, pushed it now. Sorry about that. What's left on my branch is running the benchmarks again and add the documentation to the readme.

Do you want to rebase this branch again? All my commits should just fall away and we'll have a clean diff of your changes. If not, just cherry-pick your changes.

@gilescope
Copy link
Author

@robinst's solution is neater and passes all the tests so I'm going to close this PR to reduce confusion :-)

@gilescope gilescope closed this Mar 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants