[WIP] Kinda-working fancy-regex support #34

trishume · 2017-02-09T01:57:09Z

This branch switches the regex engine to fancy-regex or more specifically my fork of it.

Currently it only works for a few syntaxes because of a few different features fancy-regex doesn't support:

The \n escape (Everything, but fixed it my fork)
Unnecessary escapes in character classes like [\<]
The \h escape in character classes (Rust)
Fix Fix nonewlines mode #76 so nonewlines mode doesn't produce weird regexes.
Named backrefs \k<marker> (Markdown)
Fancy character class syntax [a-w&&[^c-g]z]
Add support for match limit to fancy-regex: https://github.com/google/fancy-regex/issues/44

The jQuery highlighting benchmark now takes 1s instead of 0.66s. Which is super unfortunate given that I'd hoped it would be faster than Oniguruma. I have no idea why it is substantially slower.

@raphlinus @robinst

raphlinus · 2017-02-09T16:37:27Z

I took a quick look at this, profiling the highlighting of jquery. It's promising but clearly not compelling yet. It seems to be spending most of its time delegating to regex, but in the VM. This suggests that it's doing backtracking, and might not even be using the NFA (it delegates just to get classes). I have a bunch of ideas on how to optimize more, but don't have insight into specifically what's slow now. The best case would be something like a(?=b), which currently throws a into fancy mode, but could be optimized to just (a)b with fixup of the captures.

The way to make progress here is to capture which regexes are consuming the most time. I'd add some profiling, something like a lazy_static hash table on the side, so that every time the VM runs it increments a count for that regex, and accumulates the time. Then just go down the list in terms of which regexes burn the most time.

I'd be tempted to investigate myself, but am currently trying to really focus on incremental update in xi. Thanks for pushing this forward!

trishume · 2017-02-09T16:44:28Z

@raphlinus Yah that was my thought on what to investigate as well. Since it's single-threaded I can do even better than a count for each regex, I can actually measure the total elapsed time and count per regex to figure out which ones are slow. Then I can run it again with Oniguruma and see which regexes are faster with fancy-regex and which are slower.

Unfortunately, I'm back to being busy with school work and I'm not sure when/if I'll have time to do this. The perf regression combined with missing features means it's going to be a bunch of work. Not an undoable amount, but still substantial.

TimNN · 2017-02-10T10:28:26Z

@trishume: I'm trying to collect some per-regex timings, however trying to run the jquery highlighting benchmark fails because of the Ability to use \z in a character class problem. How did you get around that?

TimNN · 2017-02-10T11:48:01Z

So I did some initial benchmarking of the jquery benchmark (measuring how long each regex matching took), the result are in this gist: https://gist.github.com/b6bb756f96b58e52b3299b709fa785dd

CUM: is the cumulative time spend on a regex, AVG, the average time, the lists are sorted by average time (times are in seconds).

The code is available in the respective branches in the TimNN/syntect repo.

The worst offenders by far (based on both, AVG and CUM) are the following:

CUM: PT7.310148847S AVG: PT0.000004021S REGEX: [_$[:alpha:]][_$[:alnum:]]*(?=\s*[\[.])
CUM: PT17.180939801S AVG: PT0.000004806S REGEX: ([_$[:alpha:]][_$[:alnum:]]*)(?=\s*\()
CUM: PT21.358124652S AVG: PT0.000008019S REGEX: ([_$[:alpha:]][_$[:alnum:]]*)\s*(\.)\s*(prototype)\s*(\.)\s*(?=[_$[:alpha:]][_$[:alnum:]]*\s*=\s*(\s*\b(async\s+)?function\b|\s*(\basync\s*)?([_$[:alpha:]][_$[:alnum:]]*|\(([^()]|\([^()]*\))*\))\s*=>))
CUM: PT22.522137730S AVG: PT0.000008456S REGEX: ([_$[:alpha:]][_$[:alnum:]]*)\s*(\.)\s*(prototype)(?=\s*=\s*(\s*\b(async\s+)?function\b|\s*(\basync\s*)?([_$[:alpha:]][_$[:alnum:]]*|\(([^()]|\([^()]*\))*\))\s*=>))

They seem to match the "best case" mentioned by @raphlinus, which I guess is a good thing?

trishume · 2017-02-10T14:03:43Z

@TimNN awesome thank you! That's definitely useful information since it does indeed match up with the case @raphlinus said could be optimized without too much difficulty. Thanks for the help.

And yes the jQuery benchmark breaks because of a substitution I perform for nonewlines mode. I fixed the benchmark to use line strings with newline characters, but didn't end up committing it, sorry.

TimNN · 2017-02-11T00:11:42Z

So I've been hacking a bit on fancy-regex and managed to get the optimisation mentioned above working (at least I think so -- the code is very hacky and desperately needs cleaning up).

The results however look very promising so far: On my machine, highlighting jquery went from 1,228,132,968 ns/iter (+/- 63,891,621) down to 858,410,622 ns/iter (+/- 83,053,993), thus an improvement of about 30%.

The code is in the trailing-la-opt branch in my fancy-regex fork, if you want to give it a go. I'll try to clean it up a bit over the weekend and send a PR to get some feedback from @raphlinus.

Edit: It's probably going to take a bit longer, until I find the time to cleanup / send a PR.

trishume · 2017-02-11T00:27:33Z

@TimNN that's awesome! 858ms is still more time than it takes Oniguruma to highlight jQuery on my computer, but my computer also takes less than 1,228ms to highlight with fancy-regex so it is possible that on my computer fancy-regex will be just as fast. I'll try and test your branch on my machine at some point.

I was hoping it would actually lead to a significant performance increase eventually but merely matching the performance of Oniguruma is enough for me to make it the default once the compatibility issues are fixed since it will fix #33 and make all dependencies pure rust.

I may be able to find the time to fix some of the smaller compatibility issues I listed. Specifically the first two unfinished ones listed (I sorted by estimated difficulty). Some of the issues look difficult though, specifically a full expression parser/rewriter for the character class operators.

TimNN · 2017-02-11T00:50:59Z

I ran the jquery benchmark again with the oniguruma version and got 834,134,774 ns/iter (+/- 101,552,378), so the patch brings us at least to the same level as oniguruma on my machine for this benchmark.

(Note that my per regex benchmarking code is currently not very efficient since I had planned on collecting more stats than average & total time, so this may slow everything down a bit).

Also, using RUSTFLAGS=-Ctarget-cpu=native improved the runtime of the optimised fancy-regex version by about 50ms on my machine.

trishume · 2017-02-11T01:08:07Z

@TimNN Excellent. There's probably more optimization possible but that's great for now.

It should be theoretically faster than Oniguruma at least on syntaxes which have been optimized for Sublime's sregex engine to not use many/any fancy regex features, which should allow the rust regex crate to do everything, and that engine should be faster than Oniguruma.

I'm not sure even that would match Sublime Text's performance though. I think it uses something like https://doc.rust-lang.org/regex/regex/struct.RegexSet.html to match many regexes at once in time proportional only to the number of characters, but with support for extracting captures and match positions (unlike the regex crate).

At the moment fancy-regex supports substantially more regexes than Sublime's sregex, which I think only supports the things in the regex crate plus the optimization you just implemented for translating lookaheads and possibly lookbehinds. Otherwise it falls back to Oniguruma. But the things that sregex supports should end up almost entirely delegating to regex under fancy-regex, so the syntaxes optimized for it should be fast under fancy-regex.

robinst · 2017-02-16T06:29:48Z

Should we create issues in fancy-regex for the unsupported syntax? Seems like the better place to discuss these.

For fancy character classes ([a-w&&[^c-g]z]), it looks like that should be added to the regex crate. There's a comment in the source hinting that adding support for these is planned/welcome:

https://github.com/rust-lang/regex/blob/52fdae7169ec619530985a019184319ac4bbee5a/regex-syntax/src/lib.rs#L1408-L1410

(see UTS#18 RL1.3)

I think that would also help with implementing things such as \H in character classes (which is [^0-9A-Fa-f]).

trishume · 2017-02-16T17:01:29Z

@robinst Good point. I guess fancy-regex would be a better place.

I created an issue in regex (rust-lang/regex#341), haven't created any in fancy-regex yet though.

BurntSushi · 2017-02-18T01:16:24Z

I would be interested in lending some insight here if you folks wind up seeing bottlenecks inside the regex crate. The regex crate is fast in a lot of cases, but that doesn't mean it's fast in every case, so don't assume that the regex crate will always bail you out. :-) I'll get the ball rolling by throwing some things against the wall and seeing what sticks.

If a regex is particularly large, then it's possible that the DFA will be forced to bail out because it's thrashing its cache. When the DFA bails, it falls back to a much slower (by an ~order of magnitude) regex engine. You can tweak how much cache space is available with the dfa_size_limit option. By default, it's 2MB. The surefire indicator of regexes that might thrash the cache are regexes with large counted repetitions (e.g., (foo){1000}) or regexes with lots of Unicode classes. A few Unicode classes here and there aren't going to hurt, but if you combine large classes with counted repetitions, e.g., \pL{100}, then you're in for some pain.

RegexSet may be a little tricky to use since it is very limited in what it can do. All it can really tell you is which regex matches, but doesn't give you any position information, which means you need re-run the regexes in the set that matched to get that position information. It is worth thinking about and possibly even trying a RegexSet, but I wouldn't get your hopes up. :-)

Finally, have you folks seen any problems with performance for compiling the regexes? Does it add any noticeable overhead?

trishume · 2017-02-18T01:20:36Z

@BurntSushi cool thank you. If we get around to optimizing it more than @TimNN already has, we could probably use your advice.

From the very basic benchmarks I did on this branch I didn't see anything noticeable from Regex compilation. It didn't seem to be very different from Oniguruma. I do make sure to compile each regex at most once and only compile them if they are actually needed.

keith-hall · 2017-03-01T11:33:17Z

Hi, I just wanted to make a quick note that I've submitted a PR to the ST Packages repo that removes the named backrefs compatibility issue from the Markdown syntax, so, depending whether any other syntaxes use named backreferences, you may get away without this support.

raphlinus · 2017-03-01T17:34:57Z

Some comments.

First, I took a look at @TimNN 's optimization. It's definitely the optimization I had in mind, but is not quite suitable for merging yet (it changes the output, specifically adding more capture groups than were originally present). I am a bit surprised it's only a 30% gain, I would have expected more.

The instrumentation for total time spent, number of invocations, etc., sounds extremely useful, and I recommend that gets checked in. We'll want to track performance on an ongoing basis (assuming we go ahead with fancy-regex, and even then it's extremely useful for making that decision). Where is the time going after the optimization is in place?

In my (admittedly not very thorough) testing, the time impact of regex compilation was minimal. For large files, it's spending seconds computing the highlighting.

What's the secret to super-fast performance? Is it running multiple regexes in parallel? Is it using RegexSet? I think the latter is promising. One idea that might be worth pursuing is the idea of an approximate (conservative) pure regex, which is guaranteed to match if the source regex does, but not conversely. This would be useful primarily in pruning the regexes that need to be evaluated. One downside, though: if there's a lot of pushing and popping, then it will run multiple sets over (almost) the entire source line.

trishume · 2017-03-16T04:17:38Z

I just learned something from this conversation on Reddit with @BurntSushi and @raphlinus that may be part of the cause of the lack of performance gains.

syntect currently always does a regex search with captures even when it doesn't actually need them. Given what @BurntSushi said on Reddit, with the regex crate (which fancy-regex will call into a lot) this has a substantial performance penalty over just finding the bounds of the match.

I think an optimization where it keeps track of if a certain rule needs the captures or not may help performance with fancy-regex and possibly with Oniguruma as well, although I'm not sure there's much of a penalty to getting captures with Oniguruma, especially since it lets you re-use the capture regions struct between calls.

Not sure exactly how much difference it would make, but it would probably help a bit.

BurntSushi · 2017-03-16T11:07:54Z

syntect currently always does a regex search with captures even when it doesn't actually need them. Given what @BurntSushi said on Reddit, with the regex crate (which fancy-regex will call into a lot) this has a substantial performance penalty over just finding the bounds of the match.

There's a lot of subtlety here. There are various factors at play:

Even if you use the captures method call, if the regex itself has no captures, then the regex engine notices this and doesn't run the NFA. See: https://github.com/rust-lang/regex/blob/d813518e2a199884cd38a4e32497a7453db79697/src/exec.rs#L523-L535
Does fancy-regex ever call the captures method? Or does it handle captures on its own?
What do your regexes actually look like? (I tried looking once but I got lost. :-()
The performance guide for the regex crate is something you should definitely check out.

although I'm not sure there's much of a penalty to getting captures with Oniguruma

Right. Classical backtracking engines, IME, typically don't impose a penalty for extracting captures.

BurntSushi · 2017-03-16T11:08:34Z

FWIW, there are plans (in my head, anyway) to make capture extraction faster, but I can't commit to a timeline.

robinst · 2017-05-22T05:18:03Z

I rebased this branch and have been implementing missing features in fancy-regex, see my pull requests.

Also, the just released regex 0.2.2 now supports nested character classes and intersections, which means the "Fancy character class syntax [a-w&&[^c-g]z]" task is mostly done.

So I think the only task that doesn't have a pull request yet is "Ability to use \z in a character class (Using the nonewlines option)", which I haven't looked at yet.

I also found a regex that fancy-regex currently has trouble with (haven't investigated why yet): https://github.com/google/fancy-regex/issues/14

robinst · 2017-06-09T07:38:39Z

I looked into "Ability to use \z in a character class (Using the nonewlines option)" now. I'm pretty sure it doesn't work as expected in the current implementation, but it wasn't detected because onig doesn't complain.

With the following:

let re = onig::Regex::new(r"^a[\z]").unwrap();
println!("{}", re.is_match("a"));
println!("{}", re.is_match("az"));

The regex is compiled, but it prints false, then true. So it's equivalent to ^az, not ^a\z (which is what syntect would want).

trishume · 2017-06-09T17:45:07Z

@robinst good catch, I created an issue: #76 and updated the to-do list in this issue.

robinst · 2017-07-27T08:48:21Z

Update: Fixed #76 now. With that, all of the check boxes in the description are done.

I've rebased @trishume's fancy-regex branch here: https://github.com/robinst/syntect/tree/fancy-regex

Now there's the following failing tests left, and they fail on the assertions (instead of panicking while compiling regexes):

html::tests::strings
html::tests::tokens
parsing::parser::tests::can_parse_yaml

The last one I added here and shows that the YAML syntax is not working yet:

 thread 'parsing::parser::tests::can_parse_yaml' panicked at 'assertion failed: `(left == right)`
-  left: `[(0, Push(<source.yaml>)), (0, Push(<string.unquoted.plain.out.yaml>)), (1, Pop(1)), (1, Push(<string.unquoted.plain.out.yaml>)), (2, Pop(1)), (2, Push(<constant.language.boolean.yaml>)), (3, Pop(1)), (3, Push(<punctuation.separator.key-value.mapping.yaml>)), (4, Pop(1)), (5, Push(<string.unquoted.plain.out.yaml>)), (10, Pop(1))]`,
+ right: `[(0, Push(<source.yaml>)), (0, Push(<string.unquoted.plain.out.yaml>)), (0, Push(<entity.name.tag.yaml>)), (3, Pop(2)), (3, Push(<punctuation.separator.key-value.mapping.yaml>)), (4, Pop(1)), (5, Push(<string.unquoted.plain.out.yaml>)), (10, Pop(1))]`', src/parsing/parser.rs:482:8

I had a look at the syntax but it's pretty complex. If someone who knows the syntax wants to track down the problem, that would be cool. (I guess I should learn how to use a debugger for Rust :)).

keith-hall · 2017-07-27T09:41:45Z

I don't know YAML syntax very well, but I cut down the syntax definition to the following, and I think it should still have the same behavior with the key: value\n example (it still gets the same scopes in ST as the full YAML syntax def - untested in syntect though, sorry!), so it may help with debugging.

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: source.yaml-test
name: YAML-Test
variables:
  c_indicator: '[-?:,\[\]{}#&*!|>''"%@`]'
  # plain scalar begin and end patterns
  ns_plain_first_plain_out: |- # c=plain-out
    (?x:
        [^\s{{c_indicator}}]
      | [?:-] \S
    )

  _flow_scalar_end_plain_out: |- # kind of the negation of nb-ns-plain-in-line(c) c=plain-out
    (?x:
      (?=
          \s* $
        | \s+ \#
        | \s* : (\s|$)
      )
    )
contexts:
  main:
    - include: block-mapping
    - include: flow-scalar-plain-out

  block-mapping:
    - match: |
        (?x)
        (?=
          {{ns_plain_first_plain_out}}
          (
              [^\s:]
            | : \S
            | \s+ (?![#\s])
          )*
          \s*
          :
          (\s|$)
        )
      push:
        #- include: flow-scalar-plain-out-implicit-type
        - match: '{{_flow_scalar_end_plain_out}}'
          pop: true
        - match: '{{ns_plain_first_plain_out}}'
          set:
            - meta_scope: string.unquoted.plain.out.yaml entity.name.tag.yaml
              meta_include_prototype: false
            - match: '{{_flow_scalar_end_plain_out}}'
              pop: true
    - match: :(?=\s|$)
      scope: punctuation.separator.key-value.mapping.yaml

  flow-scalar-plain-out:
    # http://yaml.org/spec/1.2/spec.html#style/flow/plain
    # ns-plain(n,c) (c=flow-out, c=block-key)
    #- include: flow-scalar-plain-out-implicit-type
    - match: '{{ns_plain_first_plain_out}}'
      push:
        - meta_scope: string.unquoted.plain.out.yaml
          meta_include_prototype: false
        - match: '{{_flow_scalar_end_plain_out}}'
          pop: true

robinst · 2017-07-28T08:00:13Z

Thanks @keith-hall! That helped, I've noticed a difference with this pattern (narrowed down):

let regex = r"(?=\s*$|\s*:(\s|$))";
let s = "key: value";
println!("{:?}", onig::Regex::new(regex).unwrap().find(s));
println!("{:?}", fancy_regex::Regex::new(regex).unwrap().find(s));

onig returns Some((3, 3)), whereas fancy-regex returns Some((0, 0)). Removing the \s*$| part of the pattern makes it work. Ran out of time now, but looks like fancy-regex doesn't handle the $ in the positive lookahead correctly?

robinst · 2017-08-03T02:10:30Z

Ok, tracked down the problem and have a fix here: google/fancy-regex#21

With that fix, cargo test works! syntest panicks compiling one of the regexes though :).

Some of the regexes include `$` and expect it to match end of line. In fancy-regex, `$` means end of text by default. Adding `(?m)` activates multi-line mode which changes `$` to match end of line. This fixes a large number of the failed assertions with syntest.

In fancy-regex, POSIX character classes only match ASCII characters. Sublime's syntaxes expect them to match Unicode characters as well, so transform them to corresponding Unicode character classes.

With the regex crate and fancy-regex, `^` in multi-line mode also matches at the end of a string like "test\n". There are some regexes in the syntax definitions like `^\s*$`, which are intended to match a blank line only. So change `^` to `\A` which only matches at the beginning of text.

Note that this wasn't a problem with Oniguruma because it works on UTF-8 bytes, but fancy-regex works on characters.

robinst · 2018-05-01T07:00:00Z

Done! Note that you might have to cargo update locally before trying to run this (because Cargo.lock is not checked in). Updates:

A small change to make fancy-regex work with a newer regex crate.
A bug fix in syntect for code that I wrote a couple of days ago that fails with fancy-regex (but is fine with Oniguruma). += 1 on a string index is almost never the right thing to do. I walked into this one before, but apparently forgot about it again :).

keith-hall · 2018-05-02T07:07:10Z

small note for people struggling to get syntect to build on Windows: using the branch from this PR, you can edit Cargo.toml to remove the onig dependency, and everything will work fine.

Side note: probably this PR should be updated so that fancy-regex is an optional dependency as part of the parsing feature like onig is, no?

--- Cargo.toml
+++ [new] Cargo.toml
@@ -15,7 +15,7 @@
 
 [dependencies]
 yaml-rust = { version = "0.4", optional = true }
-onig = { version = "3.2.1", optional = true }
+#onig = { version = "3.2.1", optional = true }
 walkdir = "2.0"
 regex-syntax = { version = "0.4", optional = true }
 lazy_static = "1.0"
@@ -25,7 +25,7 @@
 flate2 = { version = "1.0", optional = true, default-features = false }
 fnv = { version = "1.0", optional = true }
 regex = "*"
-fancy-regex = { git = "https://github.com/google/fancy-regex.git" }
+fancy-regex = { git = "https://github.com/google/fancy-regex.git", optional = true }
 serde = { version = "1.0", features = ["rc"] }
 serde_derive = "1.0"
 serde_json = "1.0"
@@ -51,7 +51,7 @@
 # Pure Rust dump creation, worse compressor so produces larger dumps than dump-create
 dump-create-rs = ["flate2/rust_backend", "bincode"]
 
-parsing = ["onig", "regex-syntax", "fnv"]
+parsing = ["fancy-regex", "regex-syntax", "fnv"]
 # The `assets` feature enables inclusion of the default theme and syntax packages.
 # For `assets` to do anything, it requires one of `dump-load-rs` or `dump-load` to be set.
 assets = []

robinst · 2018-05-02T11:38:07Z

@keith-hall Pushed a commit with those changes, thanks!

keith-hall · 2018-05-03T08:42:11Z

src/parsing/syntax_definition.rs

-                                           RegexOptions::REGEX_OPTION_CAPTURE_GROUP,
-                                           Syntax::default())
-            .unwrap();
+        println!("compiling {:?}", self.regex_str);


should this println be here, as it generates a lot of noise? if it's useful for debugging, maybe it would be best to hide it behind a feature flag as discussed at #146 (comment)

Just commenting it out is fine, see my reply on the comment

@keith-hall I pushed a commit that changes this to only print in case it fails.

I think the println! was only there to see the regex that failed to compile.

kornelski · 2018-07-24T12:40:39Z

It would be great if that change landed, because libgit2 crashes in binaries linking oniguruma, and I'd like to use both libgit2 and syntect together.

Keats · 2018-08-23T17:56:17Z

Is https://github.com/google/fancy-regex/issues/44 the only blocker left for this?

keith-hall · 2018-08-23T18:47:29Z

I was also wondering about this. If we don't want to default to fancy-regex yet, maybe it would be worth maintaining both the oniguruma and fancy-regex code paths for now and consumers can choose which one to use from a feature flag. It may then be easier to do performance comparisons etc.

pickfire · 2018-08-25T07:55:18Z

src/parsing/syntax_definition.rs

@@ -5,12 +5,13 @@
 //! into this data structure?
 use std::collections::{BTreeMap, HashMap};
 use std::hash::Hash;
-use onig::{Regex, RegexOptions, Region, Syntax};
+use fancy_regex;


Why not continue using fancy_regex::Regex here?

You mean why not write use fancy_regex::Regex; here? Yeah there's not really a good reason. I will change it next time I work on this.

robinst · 2018-08-27T07:28:53Z

Is google/fancy-regex#44 the only blocker left for this?

@Keats Yes, unless we find another one.

@keith-hall A feature sounds like a good idea, yeah. Maybe we can abstract the regex compilation and matching parts a bit to make the feature less painful to maintain (so that it's just in one module and not all over the place).

I might have some time this week to work on this.

pickfire · 2018-08-27T13:07:18Z

src/parsing/yaml_load.rs

+/// In fancy-regex, POSIX character classes only match ASCII characters.
+/// Sublime's syntaxes expect them to match Unicode characters as well, so transform them to
+/// corresponding Unicode character classes.
+fn replace_posix_char_classes(regex: String) -> String {


Is there by any chance that we are able to do the sublime syntax replacement before run-time?

OptimisticPeach · 2019-06-23T17:42:07Z

Sorry to poke at an old issue, but is there any news on this? I'd really like to use syntect in a WASM project, but according to #135 this needs to land first.

trishume · 2019-06-23T23:26:55Z

Unfortunately this is not quite complete and based on an older release of syntect. It would probably take a fair amount of work to complete. I'm not personally interested in doing that work unfortunately, so unless someone else steps up to do it, it's not on the roadmap.

OptimisticPeach · 2019-06-24T01:18:08Z

Ah that's a bummer, I'm sorry for the letdown but I don't think I'd be able to commit to making it happen right now.

robinst · 2019-11-19T22:10:39Z

Hey, just an update. I'm working on this again, but with a different approach.

I'm moving all the regex usage to a module first, so then we can swap out the implementation using a cargo feature.

I'll have a pull request next week :).

robinst · 2019-11-25T12:22:31Z

Ok, see #270. I think we can close this PR and move discussion to #270.

trishume mentioned this pull request Feb 16, 2017

Implement (at least part of) UTS#18 RL1.3 - Operators in character sets rust-lang/regex#341

Closed

trishume mentioned this pull request Mar 2, 2017

Support advanced scope selectors #36

Open

5 tasks

trishume mentioned this pull request Apr 4, 2017

600 line Rust source file takes more than 2 minutes #33

Closed

trishume mentioned this pull request Jun 9, 2017

Fix nonewlines mode #76

Closed

2 tasks

trishume mentioned this pull request Sep 4, 2017

How does it compare to Pygments? #99

Closed

robinst added 5 commits May 1, 2018 12:39

Replace POSIX character classes so that they match Unicode as well

180a08d

In fancy-regex, POSIX character classes only match ASCII characters. Sublime's syntaxes expect them to match Unicode characters as well, so transform them to corresponding Unicode character classes.

Make parsing::util conditional on parsing feature

3edab51

Fix code that skips a character to work with unicode

84059f4

Note that this wasn't a problem with Oniguruma because it works on UTF-8 bytes, but fancy-regex works on characters.

robinst force-pushed the fancy-regex branch from 9f1cc22 to 84059f4 Compare May 1, 2018 06:55

Remove onig and make fancy-regex optional (needed for parsing only)

ac0e315

sharkdp mentioned this pull request May 2, 2018

Failing to parse some sublime-syntax files #156

Closed

keith-hall reviewed May 3, 2018

View reviewed changes

Replace println! and plain unwrap with nicer panic!

1cbdf8b

I think the println! was only there to see the regex that failed to compile.

keith-hall approved these changes May 6, 2018

View reviewed changes

pickfire reviewed Aug 25, 2018

View reviewed changes

pickfire reviewed Aug 27, 2018

View reviewed changes

Keats mentioned this pull request Sep 2, 2018

Why is it not published on crates.io? getzola/zola#392

Closed

trishume mentioned this pull request Aug 10, 2019

Use native rust regex? #12

Closed

robinst mentioned this pull request Nov 25, 2019

Move all regex usage to separate module to add support for fancy-regex #270

Merged

3 tasks

trishume closed this Nov 26, 2019

bminixhofer mentioned this pull request Feb 15, 2021

switch regex engine from oniguruma to fancy-regex bminixhofer/nlprule#18

Closed

[WIP] Kinda-working fancy-regex support #34

[WIP] Kinda-working fancy-regex support #34

Conversation

trishume commented Feb 9, 2017 • edited by robinst Loading

raphlinus commented Feb 9, 2017

trishume commented Feb 9, 2017

TimNN commented Feb 10, 2017

TimNN commented Feb 10, 2017

trishume commented Feb 10, 2017

TimNN commented Feb 11, 2017 • edited Loading

trishume commented Feb 11, 2017 • edited Loading

TimNN commented Feb 11, 2017

trishume commented Feb 11, 2017

robinst commented Feb 16, 2017

trishume commented Feb 16, 2017

BurntSushi commented Feb 18, 2017

trishume commented Feb 18, 2017

keith-hall commented Mar 1, 2017

raphlinus commented Mar 1, 2017

trishume commented Mar 16, 2017

BurntSushi commented Mar 16, 2017

BurntSushi commented Mar 16, 2017

robinst commented May 22, 2017

robinst commented Jun 9, 2017

trishume commented Jun 9, 2017

robinst commented Jul 27, 2017

keith-hall commented Jul 27, 2017

robinst commented Jul 28, 2017

robinst commented Aug 3, 2017

robinst commented May 1, 2018

keith-hall commented May 2, 2018

robinst commented May 2, 2018

keith-hall May 3, 2018

Choose a reason for hiding this comment

trishume May 3, 2018

Choose a reason for hiding this comment

robinst May 6, 2018

Choose a reason for hiding this comment

kornelski commented Jul 24, 2018

Keats commented Aug 23, 2018

keith-hall commented Aug 23, 2018

pickfire Aug 25, 2018

Choose a reason for hiding this comment

robinst Aug 27, 2018

Choose a reason for hiding this comment

robinst commented Aug 27, 2018

pickfire Aug 27, 2018

Choose a reason for hiding this comment

OptimisticPeach commented Jun 23, 2019

trishume commented Jun 23, 2019

OptimisticPeach commented Jun 24, 2019

robinst commented Nov 19, 2019

robinst commented Nov 25, 2019

trishume commented Feb 9, 2017 •

edited by robinst

Loading

TimNN commented Feb 11, 2017 •

edited

Loading

trishume commented Feb 11, 2017 •

edited

Loading