Skip to content

Commit

Permalink
Only generate unicode tokens if the unicode feature in enabled
Browse files Browse the repository at this point in the history
regex_syntax::parse() converts our regex strings into the HIR of the
regex, part of that includes unpacking various metacharacters into a
list of symbols.  In many cases, this expansion changes depending if it
should expand into unicode or not.

Prior to lalrpop#814, we were still outputting unicode regexes unconditionally,
but regex internals seem to be compiling them away and avoiding errors.
The switch in lalrpop#814 caused these to be result in real errors.

Follow-up work will be needed to determine why existing tests didn't
detect this.
  • Loading branch information
dburgener committed Oct 18, 2023
1 parent 64fc30e commit 9702d74
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions lalrpop/src/lexer/re/mod.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//! A parser and representation of regular expressions.
use regex_syntax::hir::Hir;
use regex_syntax::{self, Error, Parser};
use regex_syntax::{self, Error, ParserBuilder};

#[cfg(test)]
mod test;
Expand All @@ -19,6 +19,11 @@ pub fn parse_literal(s: &str) -> Regex {

/// Parse a regular expression like `a+` etc.
pub fn parse_regex(s: &str) -> Result<Regex, RegexError> {
let expr = Parser::new().parse(s)?;
let enable_unicode = cfg!(feature = "unicode");
let expr = ParserBuilder::new()
.utf8(enable_unicode)
.unicode(enable_unicode)
.build()
.parse(s)?;
Ok(expr)
}

0 comments on commit 9702d74

Please sign in to comment.