-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for case insensitivity #198
Conversation
* Add the `IgnoreFlags` structure * Parsing logic of `IgnoreFlags`
* Add `ApplieIgnoreFlags` trait * Implementation of that trait for a whole lot of structures * Updated `lib.rs` to take the ignore flags in account
I just noticed that Moreover, using that method basically makes my I will rewrite a good portion of that part of the system, using |
* Remove the `ApplieIgnoreFlags` trait -> Replaced with with `MakeAsciiInsensitive` trait which serves the same purpose but only for the flag `IgnoreAsciiCase`.
* `Literal::to_mir` now takes ignore flags and parses mir following those flags. * Add `Mir::utf8_ignore_case` and `Mir::binary_ignore_case` * Add `Literal::escape_regex`
Those three last commits removed the |
} else { | ||
let mir = definition | ||
.literal | ||
.escape_regex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as I'm calling .to_mir
right after, we need to ensure the literal will actually be used as a string and not some regex semantics.
For example, #[token("[Some Header]", ignore(case))]
needs to be properly escaped so that it does not turn into Mir::Alternation('S' ... 'r')
but Mir::Concat('[', 'S', ... , 'r', ']')
.
Originally, I had written a function that directly converts any literal into a case insensitive MIR but I later found out about the case_insensitive
function on HIR's ParserBuilder
. I thought it would be clearer to use it instead of writing it myself (and it is also less error-prone) For this reason, I used the to_mir
function along with a escape_regex
.
Published in 0.12 |
Issue: #196
Case insensitivity is opt-in using the
ignore(...)
attribute.For now, I prevented the user to use both
ascii_case
andcase
at the same time. This can be easily changed.Internally, all this attribute does is converting the original literal (for tokens) or the MIR (for regular expression) into another MIR that accepts both uppercase and lowercase variants of the input.
Unanswered questions
IgnoreFlags
struct to make theignore
attribute more easily expandable. It uses a basic bit field. I wasn't sure whether a new dependency was ok so I implemented myself what thebitflags
crate do (in much much simpler). Do we want a new dependency to make the code cleaner or the little features I implemented are enough?generator
module quite complicated to dig into and understand.Ranges (eg.Now everything uses simple folding.regex("[a-z]", ...)
) are currently usingClassUnicode::case_fold_simple
and normal literals (eg.regex("abc", ...)
) are using Rust'sto_lowercase
andto_uppercase
functions. This might lead to inconsistencies if those two methods do not convert characters the same way.PS: This is my first PR ever. If I've done anything wrong, please tell me! I'm both doing this because I needed this feature and to get reviews of my code. Feel free to say it needs some (or a lot) rework :)