-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler should error when encountering invalid regular expressions #3432
Comments
I had missed this by only searching for I'm working on a Spec-based RegExp tokenizer written in JS, hopefully you'll be able to make use of it. |
Moved from #54744: Previously worked on #51837, I found that TypeScript gives almost no syntax errors for regular expressions. I would like to file a PR about it. Something I would like to do are:
Am I doing too much or too less? I know doing too much may cause serious performace regressions (well, luckily regular expression literals are not that common compared with string literals). It should be better than doing nothing after all though. |
Imbalanced parens / bracket seems like where 99% of the value is. Duplicate flags seems like an error no one's ever made before; usually its Erroring on escapes that are invalid regardless of flags seems like a fine compromise. IMO it's really actually fine if once a year your program unconditionally crashes on startup in the cases where you made an extremely rare mistake. The value is in flagging errors that are made every day. |
Yup, duplicate flags checking is valueless 😅, but unknown seems worth a bit (like if someone accidentally typed the Cyrillic у instead of the Latin y for some reason with the "Editor > Unicode Highlight: Ambiguous Characters" config turned off or any of the "Editor > Unicode Highlight: Allowed Locales" permit the character). (Plus, there are no reasons not to show errors if the flag part isn’t really flags (like I am still wondering if a full parse should be done. Doing that does affect performance, but it would be helpful to further TypeScript extensions like #41160. Of course, this requires sub-nodes to be added under |
Pinging @RyanCavanaugh and @DanielRosenwasser for opinions. |
I feel like we should be able to do a parse-only pass (i.e. just scan and descend in order to validate) of the regex without creating nodes as you go, since there's no consumer of that output, just the production of errors as a side effect |
The output are useful to TypeScript API users for creating RegExp-related type utilities. We could make use of the parsed result to make methods like |
The downstream tools can re-parse if they really need that data. We're very sensitive to perf papercuts and not likely to accept the feature if the perf cost is nontrivial, and allocating more objects is something that is likely to incur broad perf hits due to slowing down GC, etc.. |
Surely this is a case of attempting to reserialise the raw regexp to the string representation then into RegExp class? If it blows up, the string is invalid! That should be fast too. |
I see Ryan has suggested the same thing but in CS speak 😛 |
I actually hadn't thought of it in that much simpler way! |
@zm-cttae @RyanCavanaugh There are already attempts like #4387 and #35957 using this approach but was closed. And this is definitely not a good solution, because:
Luckily a simple parser without node generation has little effect on the performance, and I plan to make my PR available at the end of this month. Besides I have a follow-up proposal to further enforce type safety of RegExp-related methods and enhance UX (providing auto-completion) that make use of the implementation, but that would be another separate issue and PR afterwards. |
To be fair there are only two types of error that can be assessed independently of each other - bad escapes, and invalid grouping |
I agree however that the error isn't too useful for assessing what broke the regex in the first place. |
I didn’t manage to finish the implementation within this month due to my work but it’s coming along nicely. I’ll be back on it shortly and make the PR available ASAP. |
In
tests/cases/conformance/parser/ecmascript5/RegressionTests/parser579071.ts
we have an invalid regex:This is not valid JavaScript, but we don't give any errors. It would be helpful to let users know when their regular expressions are invalid.
The text was updated successfully, but these errors were encountered: