-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move regex use to module: Rebase #283
Move regex use to module: Rebase #283
Conversation
That way we can add fancy-regex support behind a feature.
* Adds a std::error::Error impl for Error * Adds a backtracking limit to mitigate catastrophic backtracking
Without this, some parsing benchmarks took 30% longer to run.
Some of the regexes include `$` and expect it to match end of line. In fancy-regex, `$` means end of text by default. Adding `(?m)` activates multi-line mode which changes `$` to match end of line. This fixes a large number of the failed assertions with syntest.
In fancy-regex, POSIX character classes only match ASCII characters. Sublime's syntaxes expect them to match Unicode characters as well, so transform them to corresponding Unicode character classes.
With the regex crate and fancy-regex, `^` in multi-line mode also matches at the end of a string like "test\n". There are some regexes in the syntax definitions like `^\s*$`, which are intended to match a blank line only. So change `^` to `\A` which only matches at the beginning of text.
Note that this wasn't a problem with Oniguruma because it works on UTF-8 bytes, but fancy-regex works on characters.
Always adding `(?m)` for the entire regex meant that `.` also changed meaning, which is not what we want. The safer option is to use `(?m:$)` for `$` only. That also means we don't have to bother with `\A`. But we do need to parse look-behinds because we can't use `(?m:$)` in it.
Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was using before.
That way we can add fancy-regex support behind a feature.
* Adds a std::error::Error impl for Error * Adds a backtracking limit to mitigate catastrophic backtracking
Without this, some parsing benchmarks took 30% longer to run.
Some of the regexes include `$` and expect it to match end of line. In fancy-regex, `$` means end of text by default. Adding `(?m)` activates multi-line mode which changes `$` to match end of line. This fixes a large number of the failed assertions with syntest.
In fancy-regex, POSIX character classes only match ASCII characters. Sublime's syntaxes expect them to match Unicode characters as well, so transform them to corresponding Unicode character classes.
With the regex crate and fancy-regex, `^` in multi-line mode also matches at the end of a string like "test\n". There are some regexes in the syntax definitions like `^\s*$`, which are intended to match a blank line only. So change `^` to `\A` which only matches at the beginning of text.
Note that this wasn't a problem with Oniguruma because it works on UTF-8 bytes, but fancy-regex works on characters.
Always adding `(?m)` for the entire regex meant that `.` also changed meaning, which is not what we want. The safer option is to use `(?m:$)` for `$` only. That also means we don't have to bother with `\A`. But we do need to parse look-behinds because we can't use `(?m:$)` in it.
Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was using before.
Maybe to solve the no defaults we always include fancy and rely on the compiler to throw away the code if people choose onig? Either that or force people to pick a feature when they have no default features selected. |
@trishume thoughts? |
With this, I am still unable to build on Windows even without the onig feature enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, great that you got it fully rebased with CI passing, nice work!
I'll give this a little bit of time to see if @robinst wants to weigh in on how to move forward with getting fancy-regex landed since he's been doing most of the fancy-regex stuff so far. Ping me if this just ends up sitting here by Sunday.
Cargo.toml
Outdated
@@ -20,7 +20,7 @@ exclude = [ | |||
[dependencies] | |||
yaml-rust = { version = "0.4", optional = true } | |||
onig = { version = "5.0", optional = true } | |||
fancy-regex = { version = "0.3.0", optional = true } | |||
fancy-regex = { version = "0.3.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing a catch here, but if we're going this way would it be possible to rejigger things so that fancy-regex
is included by the parsing
feature? This might require adding some cfgs to some modules to require the parsing feature to be built. It used to be the case that when no default features were included no regex engine was included, which was at least nice for Xi, which only uses the theme functionality.
A `null` implmentation for when --no-default-features.
Didn't realise it could work with no regex. Have introduced a NullObject implementation for when no regex feature is selected. Again happy with either PR :-) whatever gets us moving! |
The build failure I get is the same as #264 |
fe28a3c
to
a7045b1
Compare
So this is merging into my branch? I actually rebased myself, pushed it now. Sorry about that. What's left on my branch is running the benchmarks again and add the documentation to the readme. Do you want to rebase this branch again? All my commits should just fall away and we'll have a clean diff of your changes. If not, just cherry-pick your changes. |
@robinst's solution is neater and passes all the tests so I'm going to close this PR to reduce confusion :-) |
Attempt to bring pr 6 up to date with master.