New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Move regex use to module: Rebase #283

Closed

gilescope wants to merge 28 commits into trishume:move-regex-use-to-module from gilescope:move-regex-use-to-module

gilescope commented Mar 19, 2020

Attempt to bring pr 6 up to date with master.

robinst and others added 25 commits

November 25, 2019 23:10


          Move all regex usage to separate module

2f6b1b9

That way we can add fancy-regex support behind a feature.


          Bump fancy-regex to 0.3.0

731769e

* Adds a std::error::Error impl for Error
* Adds a backtracking limit to mitigate catastrophic backtracking


          Restore optimization of reusing Regions

b2fa35a

Without this, some parsing benchmarks took 30% longer to run.


          Change feature cfg so that regex-onig wins if both features are enabled

d8de89f


          Add YAML parsing test

d668233


          Compile regexes in multi-line mode for the "newlines" syntaxes

49c0dd4

Some of the regexes include `$` and expect it to match end of line. In
fancy-regex, `$` means end of text by default. Adding `(?m)` activates
multi-line mode which changes `$` to match end of line.

This fixes a large number of the failed assertions with syntest.


          Replace POSIX character classes so that they match Unicode as well

67c971d

In fancy-regex, POSIX character classes only match ASCII characters.
Sublime's syntaxes expect them to match Unicode characters as well, so
transform them to corresponding Unicode character classes.


          Replace ^ with \A in multi-line mode regexes

085c9d3

With the regex crate and fancy-regex, `^` in multi-line mode also
matches at the end of a string like "test\n". There are some regexes in
the syntax definitions like `^\s*$`, which are intended to match a blank
line only. So change `^` to `\A` which only matches at the beginning of
text.


          Fix code that skips a character to work with unicode

6214e6f

Note that this wasn't a problem with Oniguruma because it works on UTF-8
bytes, but fancy-regex works on characters.


          Fix rewriting of "newlines" mode regexes

6842e7b

Always adding `(?m)` for the entire regex meant that `.` also changed meaning,
which is not what we want. The safer option is to use `(?m:$)` for `$` only.

That also means we don't have to bother with `\A`. But we do need to parse
look-behinds because we can't use `(?m:$)` in it.


          Remove special treatment of look-behind

fe28a3c

Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was
using before.


          Move all regex usage to separate module

174045d

That way we can add fancy-regex support behind a feature.


          Bump fancy-regex to 0.3.0

c2b97cc

* Adds a std::error::Error impl for Error
* Adds a backtracking limit to mitigate catastrophic backtracking


          Restore optimization of reusing Regions

00885f9

Without this, some parsing benchmarks took 30% longer to run.


          Change feature cfg so that regex-onig wins if both features are enabled

a102c13


          Add YAML parsing test

37e9849


          Compile regexes in multi-line mode for the "newlines" syntaxes

f18e015

Some of the regexes include `$` and expect it to match end of line. In
fancy-regex, `$` means end of text by default. Adding `(?m)` activates
multi-line mode which changes `$` to match end of line.

This fixes a large number of the failed assertions with syntest.


          Replace POSIX character classes so that they match Unicode as well

5a94c5e

In fancy-regex, POSIX character classes only match ASCII characters.
Sublime's syntaxes expect them to match Unicode characters as well, so
transform them to corresponding Unicode character classes.


          Replace ^ with \A in multi-line mode regexes

5414dce

With the regex crate and fancy-regex, `^` in multi-line mode also
matches at the end of a string like "test\n". There are some regexes in
the syntax definitions like `^\s*$`, which are intended to match a blank
line only. So change `^` to `\A` which only matches at the beginning of
text.


          Fix code that skips a character to work with unicode

b71c725

Note that this wasn't a problem with Oniguruma because it works on UTF-8
bytes, but fancy-regex works on characters.


          Fix rewriting of "newlines" mode regexes

5997d36

Always adding `(?m)` for the entire regex meant that `.` also changed meaning,
which is not what we want. The safer option is to use `(?m:$)` for `$` only.

That also means we don't have to bother with `\A`. But we do need to parse
look-behinds because we can't use `(?m:$)` in it.


          Remove special treatment of look-behind

e4bf16c

Turns out `(?m:$)` works in look-behinds, just not `(?m)$(?-m)` which I was
using before.


          Compiler couldn't infer type.

33f9a19


          Merge branch 'move-regex-use-to-module' into move-regex-use-to-module

76bd731


          Pick fancy-regex if no default features

1b68bf3

Author

gilescope commented Mar 19, 2020

Maybe to solve the no defaults we always include fancy and rely on the compiler to throw away the code if people choose onig? Either that or force people to pick a feature when they have no default features selected.

Author

gilescope commented Mar 19, 2020

@trishume thoughts?


          Have fancy regex the default.

807b60d

Adarma commented Mar 19, 2020 •

edited

Loading

With this, I am still unable to build on Windows even without the onig feature enabled.
I created pull request #284 to make fancy the default on windows and not depend on onig at all.
onig remains default for other OSs with fancy as an optional feature. That change builds on windows by default.

Author

gilescope commented Mar 19, 2020

Totally happy with #284. Sorry on a mac here so didn't test it on windows. Curious what the build failure was, but really any which way we get a pure rust impl works for me. @trishume does #284 work for you? (I'm keen on cargo-expand being pure rust as lots of people try and use it)

trishume reviewed

View reviewed changes

Owner

trishume left a comment

Cool, great that you got it fully rebased with CI passing, nice work!

I'll give this a little bit of time to see if @robinst wants to weigh in on how to move forward with getting fancy-regex landed since he's been doing most of the fancy-regex stuff so far. Ping me if this just ends up sitting here by Sunday.

Cargo.toml Outdated

@@ @@ -20,7 +20,7 @@ exclude = [ @@
               [dependencies]
               yaml-rust = { version = "0.4", optional = true }
               onig = { version = "5.0", optional = true }
-              fancy-regex = { version = "0.3.0", optional = true }
+              fancy-regex = { version = "0.3.0" }

Owner

trishume Mar 20, 2020

Maybe I'm missing a catch here, but if we're going this way would it be possible to rejigger things so that fancy-regex is included by the parsing feature? This might require adding some cfgs to some modules to require the parsing feature to be built. It used to be the case that when no default features were included no regex engine was included, which was at least nice for Xi, which only uses the theme functionality.


          Should be able to work with no regex lib.

447f4b8

A `null` implmentation for when --no-default-features.

Author

gilescope commented Mar 20, 2020

Didn't realise it could work with no regex. Have introduced a NullObject implementation for when no regex feature is selected. Again happy with either PR :-) whatever gets us moving!


          Regex tests should run if no regex feature selected.

d173896

Adarma commented Mar 20, 2020

The build failure I get is the same as #264

robinst force-pushed the move-regex-use-to-module branch from fe28a3c to a7045b1 Compare

March 20, 2020 09:17

Collaborator

robinst commented Mar 20, 2020

So this is merging into my branch? I actually rebased myself, pushed it now. Sorry about that. What's left on my branch is running the benchmarks again and add the documentation to the readme.

Do you want to rebase this branch again? All my commits should just fall away and we'll have a clean diff of your changes. If not, just cherry-pick your changes.

Author

gilescope commented Mar 21, 2020

@robinst's solution is neater and passes all the tests so I'm going to close this PR to reduce confusion :-)

gilescope closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet