-
-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regular_expression: report character span offsets not byte offsets #6141
Comments
@leaysgur Maybe you have some insight on this. I was having trouble correctly parsing control characters because the span offsets didn't match the source when unicode escapes were involved. I'm not certain how to fix the issue though. |
Yes, I think I know what is going on here. |
TL;DRThis is an issue related to "escape sequences" in JavaScript string literals.
The corresponding position might not be correctly retrieved from the parsed SimplifiedIf there is a "file" containing:
and you want to slice each fn main() {
let source_text = std::fs::read_to_string("./test.js").unwrap();
assert_eq!("O", &source_text[1..2]);
// &source_text[2..3]; \ 👈🏻
// &source_text[3..4]; "
assert_eq!("X", &source_text[4..5]);
// &source_text[5..6]; \ 👈🏻
// &source_text[6..7]; "
assert_eq!("C", &source_text[7..8]);
} You need to include However, if you're just handling the "string": fn main() {
let source_text = "O\"X\"C";
assert_eq!("O", &source_text[0..1]);
// &source_text[1..2]; "
assert_eq!("X", &source_text[2..3]);
// &source_text[3..4]; "
assert_eq!("C", &source_text[4..5]);
} There's no need to consider the This difference is causing problems when a diagnostic tries to slice the file. Current statusAs an implementation detail, In most cases, the current code works with an AST with At this point from Thus, the regular expression can be parsed without needing to be aware of these differences. However, when attempting to slice the original code string using the parsed As seen in the original PR: new RegExp('\\u{1111}*\\x1F', 'u')
^ ^ These 2 At that time, I was aware of this issue but was only focused on parsing. I was satisfied with How to fixIn my view, the only solution is to update
Are there any other approaches to consider...? |
This makes sense to me, I think it would it also align us with the
I don't think we need to handle this necessarily. The ESLint parser doesn't bother to even check template literals for most rules it seems: |
I agree, we need to make it parse from raw string instead of the interpreted string value. |
Thank you both. 👍🏻 I push_front this task to my task queue. 📚 |
Preparation for #6141 - Keep `enum` size + add size asserts tests - Arrange AST related directories - Renaming
OK..., this might not be as easy as expected. Does this mean we need |
Progress updates:
|
…r`) to handle escape sequence in RegExp('pat') (#6635) Preparation for #6141 `oxc_regular_expression` can already parse and validate both `/regexp-literal/` and `new RegExp("string-literal")`. But one thing that is not well-supported was reporting `Span` for the `RegExp("string-literal-with-\\escape")` case. For example, these two cases produce the same `RegExp` instances in JavaScript: - `/\d+/` - `new RegExp("\\d+")` For now, mainly in `oxc_linter`, the latter case is parsed with `oxc_parser` -> `ast::literal::StringLiteral` AST node -> `value` property. At this point, escape sequences are resolved(!), `oxc_regular_expression` can handle aligned `&str` as an argument without any problem in both cases. However, in terms of `Span` representation, these cases should be handled differently because of the `\\` in string literals... As a result, the parsed AST's `Span` for `new RegExp("string-literal")` is not accurate if it contains escape sequences. e.g. https://github.com/oxc-project/oxc/blob/a01a5dfdafb9cd536cb87867697e3ae43b1990e6/crates/oxc_linter/src/snapshots/no_invalid_regexp.snap#L118-L122 Each time the `\` appears, the subsequent position is shifted. `_` should be placed under `*` in this case. So... to resolve this issue, we need to implement `string_literal_parser` first, and use them as reading units of `oxc_regular_expression`.
@camchenry If you are planning to do something, please refer to this. 😉 oxc/crates/oxc_linter/src/rules/eslint/no_control_regex.rs Lines 129 to 137 in 8032813
|
see body of #6129
The text was updated successfully, but these errors were encountered: