-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong regular expressions with regard to unicode codepoints #24
Comments
How are you providing the input? When I try the latest version, I see the correct result: https://runkit.com/embed/klngsd4jwj5m const r1 = regexgen(["I \u2665 cake"]); // /I \u2665 cake/
console.log("I \u2665 cake".match(r1)); // ["I ♥ cake"] As an aside, I don't think that opening with a passive-aggressive sentence on someone's spare-time project is the best way to get your open source issues looked at. |
@gilmoreorless I forgot to mention that I was using the CLI which produce the erroneous results above.
I'm sorry to disappoint you but I did not have any aggressive feelings when I opened this issue. I just uttered an assumption based on the fact that a lot of other open issues have not been dealt with for a long time. That's all, no emotions involved. |
Apologies for misreading your intent. The command line usage makes more sense for this issue. I'd say the problem actually lies in the difference between strings in JavaScript and the command line. When you run Bash and most other shells follow the C quoting rules which has different parsing rules for strings, depending on the quoting mechanism. Specifically, for escape sequences such as This can be shown by telling $ node -e "console.log(process.argv)" "I \u2665 cake"
[ '/full/path/to/node',
'I \\u2665 cake' ]
$ node -e "console.log(process.argv)" 'I \u2665 cake'
[ '/full/path/to/node',
'I \\u2665 cake' ]
$ node -e "console.log(process.argv)" $"I \u2665 cake"
[ '/full/path/to/node',
'$I \\u2665 cake' ]
$ node -e "console.log(process.argv)" $'I \u2665 cake'
[ '/full/path/to/node',
'I ♥ cake' ] Therefore the input string will have to be escaped in the same way for regexgen to receive it properly: $ regexgen 'I \u2665 cake'
/I \\u2665 cake/
$ regexgen $'I \u2665 cake'
/I \u2665 cake/ |
Thanks for the explanation, @gilmoreorless. But this is not nice. The CLI should take care of handling the quoting and escaping rules in the different shells. Is this possible? If so, any chance to fix this? |
Hi, even though I doubt that you will provide bug fixes and updates after two years of inactivity, I want you to know about the following issues with escaped and unescaped unicode codepoints:
Is there any chance for you to fix these issues? Thanks in advance.
The text was updated successfully, but these errors were encountered: