-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite parser as part of new regex-syntax crate. #87
Conversation
This commit introduces a new `regex-syntax` crate that provides a regular expression parser and an abstract syntax for regular expressions. As part of this effort, the parser has been rewritten and has grown a substantial number of tests. The `regex` crate itself hasn't changed too much. I opted for the smallest possible delta to get it working with the new regex AST. In most cases, this simplified code because it no longer has to deal with unwieldy flags. (Instead, flag information is baked into the AST.) Here is a list of public facing non-breaking changes: * A new `regex-syntax` crate with a parser, regex AST and lots of tests. This closes #29 and fixes #84. * A new flag, `x`, has been added. This allows one to write regexes with insignificant whitespace and comments. * Repetition operators can now be directly applied to zero-width matches. e.g., `\b+` was previously not allowed but now works. Note that one could always write `(\b)+` previously. This change is mostly about lifting an arbitrary restriction. And a list of breaking changes: * A new `Regex::with_size_limit` constructor function, that allows one to tweak the limit on the size of a compiled regex. This fixes #67. The new method isn't a breaking change, but regexes that exceed the size limit (set to 10MB by default) will no longer compile. To fix, simply call `Regex::with_size_limit` with a bigger limit. * Capture group names cannot start with a number. This is a breaking change because regexes that previously compiled (e.g., `(?P<1a>.)`) will now return an error. This fixes #69. * The `regex::Error` type has been changed to reflect the better error reporting in the `regex-syntax` crate, and a new error for limiting regexes to a certain size. This is a breaking change. Most folks just call `unwrap()` on `Regex::new`, so I expect this to have minimal impact. Closes #29, #67, #69, #79, #84. [breaking-change]
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
cc @alexcrichton @huonw @andrew-d I think the thing I am most concerned about here is the definition of the Also, part of the enum exposes the |
@@ -21,8 +21,16 @@ path = "regex_macros/benches/bench_dynamic.rs" | |||
test = false | |||
bench = true | |||
|
|||
[dependencies.regex-syntax] | |||
path = "regex_syntax" | |||
version = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer to write out this section like so nowadays:
[dependencies]
regex-syntax = { path = "regex_syntax", version = "*" }
Also, perhaps this could have a more strict version requirement than *
? And finally, perhaps the folder could be called regex-syntax
to match the crate name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And finally, perhaps the folder could be called regex-syntax to match the crate name?
Happy to oblige. I originally had that, but it looks very weird next to regex_macros
, which I think is stuck with an underscore? :-(
👍 to everything else!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately we can't switch underscore-to-dash after the fact just yet :(
I would like to enable this kind of feature on crates.io though as I suspect it comes up more often than not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, hypens it is!
This looks awesome, thanks so much @BurntSushi!
These all seem minor enough that I would be fine releasing this version as a minor update to this library (not a major version change).
I personally like having an enum for errors like this, and I think that they should be considered "defacto extensible" in the sense that although it's a breaking change to add a variant, I would not consider it one. You could add a
Hm, this is interesting! I would believe that there's definitely a good deal of API leakage between crates in terms of a major version bump of your deps may require a major version bump for you, but I think with a dependency like |
[dev-dependencies] | ||
rand = "0.3" | ||
|
||
[features] | ||
pattern = [] | ||
|
||
[profile.bench] | ||
opt-level = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that 3 is the default, so perhaps this could just be specified to set lto = true
The big change here is the addition of a non-public variant in the error enums. This will hint to users that one shouldn't exhaustively match the enums in case new variants are added.
I think I've addressed everything. Just followed all of your suggestions. :-) I can merge once Travis gives the OK! |
Sounds good to me! |
Rewrite parser as part of new regex-syntax crate.
Haha. Do your worst. This time, quickcheck has my back. :P |
This commit introduces a new
regex-syntax
crate that provides aregular expression parser and an abstract syntax for regular
expressions. As part of this effort, the parser has been rewritten and
has grown a substantial number of tests.
The
regex
crate itself hasn't changed too much. I opted for thesmallest possible delta to get it working with the new regex AST.
In most cases, this simplified code because it no longer has to deal
with unwieldy flags. (Instead, flag information is baked into the AST.)
Here is a list of public facing non-breaking changes:
regex-syntax
crate with a parser, regex AST and lots of tests.This closes [feature request] Expose libregex's parsing/compiling internals #29 and fixes Attempting to parse {2} causes panic #84.
x
, has been added. This allows one to write regexes withinsignificant whitespace and comments.
matches. e.g.,
\b+
was previously not allowed but now works.Note that one could always write
(\b)+
previously. This changeis mostly about lifting an arbitrary restriction.
And a list of breaking changes:
Regex::with_size_limit
constructor function, that allows oneto tweak the limit on the size of a compiled regex. This fixes impose a limit on the size of a compiled regex program #67.
The new method isn't a breaking change, but regexes that exceed the
size limit (set to 10MB by default) will no longer compile. To fix,
simply call
Regex::with_size_limit
with a bigger limit.change because regexes that previously compiled (e.g.,
(?P<1a>.)
)will now return an error. This fixes Underscore in replacement string treated as backspace #69.
regex::Error
type has been changed to reflect the better errorreporting in the
regex-syntax
crate, and a new error for limitingregexes to a certain size. This is a breaking change. Most folks just
call
unwrap()
onRegex::new
, so I expect this to have minimalimpact.
Closes #29, #67, #69, #79, #84.
[breaking-change]