This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_js_analyze): noControlCharactersInRegex #4656

Merged

unvalley merged 3 commits into rome:main from unvalley:no-control-characters-in-regex

Jul 15, 2023

Contributor

unvalley commented Jul 4, 2023 •

edited

Loading

Summary

Closes #3962

noControlCharactersInRegex

description

Prevents from having control characters and some escape sequences that match control characters in regular expressions.

Control characters are hidden special characters that are numbered from 0 to 31 in the ASCII system. They're not commonly used in JavaScript text. So, if you see them in a pattern (called a regular expression), it's probably a mistake.

The following elements of regular expression patterns are considered possible errors in typing and are therefore disallowed by this rule:

- Hexadecimal character escapes from \x00 to \x1F
- Unicode character escapes from \u0000 to \u001F
- Unicode code point escapes from \u{0} to \u{1F}
- Unescaped raw characters from U+0000 to U+001F

Control escapes such as \t and \n are allowed by this rule.

ESLint: https://eslint.org/docs/latest/rules/no-control-regex

Test Plan

cargo test -p rome_js_analyze -- no_control_characters_in_regex

netlify bot commented Jul 4, 2023 •

edited

Loading

✅ Deploy Preview for docs-rometools ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`047e901`
🔍 Latest deploy log	https://app.netlify.com/sites/docs-rometools/deploys/64b251ebd2d5780008f8ddda
😎 Deploy Preview	https://deploy-preview-4656--docs-rometools.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

github-actions bot added A-Linter A-Parser A-Project A-Diagnostic labels

unvalley marked this pull request as ready for review

July 11, 2023 17:20

ematipico reviewed

View reviewed changes

Contributor

ematipico left a comment •

edited

Loading

I have a suggestion. Feel free to take or reject my suggestion. And if you take it, feel free to apply the suggestion in the next PR. Doing it in the next PR would allow us to catch possible regressions :)

At the moment logic of the rule creates a string every time we see a control character. I believe this is not efficient because technically we already have that string inside the node.

My suggestion would be to store the range of where the control character is. With range I mean the index where the control character starts and the index where the control character ends. You can save the index as (usize, usize) or TextRange.

Doing so would allow us to do better things:

we don't allocate a new string for every control character

to retrieve the string, we just need do

 let control_character = original_string[start_index..end_index];

we can create a TextRange from those indexes and show the control character in the diagnostic

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs Outdated Show resolved Hide resolved

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs Outdated Show resolved Hide resolved

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

+              fn decode_unicode_escape_to_code_point(iter: &mut Peekable<Chars>) -> Option<(String, i64)> {
+                  let mut digits = String::new();
+                  for _ in 0..4 {

Contributor

ematipico Jul 12, 2023

Could you leave a comment to explain why 0..4? To me, as a first reader, don't make sense

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs Outdated Show resolved Hide resolved

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs Outdated

+                  decode: fn(&mut Peekable<Chars>) -> Option<(String, i64)>,
+              ) {
+                  if let Some((s, cp)) = decode(iter) {
+                      if (0..32).contains(&cp) {

Contributor

ematipico Jul 12, 2023

Comment to explain 0..32

crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Comment on lines +239 to +241

+                          markup! {
+                              "Unexpected control character(s) in regular expression: "<Emphasis>{state.join(", ")}</Emphasis>""
+                          },

Contributor

ematipico Jul 12, 2023

We should let the user know what they should do here. For example, I see the diagnostic for the first time, and I don't know how to solve the issue 😄

Contributor Author

unvalley commented Jul 13, 2023

@ematipico Thanks for the review! I addressed reviewed points.
After merging this PR, I'll create an issue for updating to use TextRange instead of creating new string.

unvalley added 2 commits

July 15, 2023 13:33


          feat(rome_js_analyze): noControlCharactersInRegex

2345a87

test: add test cases

feat: restore previous implementation

docs: lint rule details

chore: use js instead of cjs

chore: codegen

chore: fix lint rule category order

refactor: logics in rule

chore: codegen

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

refactor: update comments and naming

feat: add note to the rule and update docs

refactor: fix remaining


          chore: codegen

5d10cab

ematipico approved these changes

View reviewed changes


          chore: documentation

047e901

unvalley merged commit 5510f5f into rome:main

unvalley deleted the no-control-characters-in-regex branch

July 15, 2023 09:18

unvalley mentioned this pull request

📎 noControlCharactesrInRegex should reuse control characters instead of generating string #4702

Open

unvalley mentioned this pull request

feat(linter): noMisleadingCharacterClass biomejs/biome#909

Merged

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A-Diagnostic A-Linter A-Parser A-Project