Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong regular expressions with regard to unicode codepoints #24

Open
pemistahl opened this issue Sep 22, 2019 · 4 comments
Open

Wrong regular expressions with regard to unicode codepoints #24

pemistahl opened this issue Sep 22, 2019 · 4 comments

Comments

@pemistahl
Copy link

Hi, even though I doubt that you will provide bug fixes and updates after two years of inactivity, I want you to know about the following issues with escaped and unescaped unicode codepoints:

// 1. correct
Input : "I ♥ cake"
Output: /I \u2665 cake/
Proof : "I ♥ cake".match(/I \u2665 cake/)
Result: Array [ "I ♥ cake" ] // OK

// 2. correct
Input : "I \\u2665 cake"
Output: /I \\u2665 cake/
Proof : "I \\u2665 cake".match(/I \\u2665 cake/)
Result: Array [ "I \\u2665 cake" ] // OK

// 3. failure
Input : "I \u2665 cake"
Output: /I \\u2665 cake/
Proof : "I \u2665 cake".match(/I \\u2665 cake/)
Result: null // OOPS! 
Expected Output: /I \u2665 cake/

// 4. failure
Input : "I \u{2665} cake"
Output: /I \\u{2665} cake/
Proof : "I \u{2665} cake".match(/I \\u{2665} cake/)
Result: null // OOPS! 
Expected Output: /I \u2665 cake/

// 5. failure
Input : "I \\u{2665} cake"
Output: /I \\u{2665} cake/
Proof : "I \\u{2665} cake".match(/I \\u{2665} cake/)
Result: null // OOPS! 
Expected Output: /I \\u\{2665\} cake/

Is there any chance for you to fix these issues? Thanks in advance.

@gilmoreorless
Copy link
Contributor

How are you providing the input? When I try the latest version, I see the correct result: https://runkit.com/embed/klngsd4jwj5m

const r1 = regexgen(["I \u2665 cake"]); // /I \u2665 cake/
console.log("I \u2665 cake".match(r1)); // ["I ♥ cake"]

As an aside, I don't think that opening with a passive-aggressive sentence on someone's spare-time project is the best way to get your open source issues looked at.

@pemistahl
Copy link
Author

@gilmoreorless I forgot to mention that I was using the CLI which produce the erroneous results above.

$ regexgen "I \u{2665} cake"
/I \\u{2665} cake/

I'm sorry to disappoint you but I did not have any aggressive feelings when I opened this issue. I just uttered an assumption based on the fact that a lot of other open issues have not been dealt with for a long time. That's all, no emotions involved.

@gilmoreorless
Copy link
Contributor

Apologies for misreading your intent.

The command line usage makes more sense for this issue. I'd say the problem actually lies in the difference between strings in JavaScript and the command line. When you run regexgen "blah" in the CLI, the "blah" string is first being interpreted according to the rules of the CLI, then passed to the Node process.

Bash and most other shells follow the C quoting rules which has different parsing rules for strings, depending on the quoting mechanism. Specifically, for escape sequences such as \u to work, they must be within single quotes, preceded by a $ character (reference).

This can be shown by telling node to log out the arguments it receives:

$ node -e "console.log(process.argv)" "I \u2665 cake"
[ '/full/path/to/node',
  'I \\u2665 cake' ]

$ node -e "console.log(process.argv)" 'I \u2665 cake'
[ '/full/path/to/node',
  'I \\u2665 cake' ]

$ node -e "console.log(process.argv)" $"I \u2665 cake"
[ '/full/path/to/node',
  '$I \\u2665 cake' ]

$ node -e "console.log(process.argv)" $'I \u2665 cake'
[ '/full/path/to/node',
  'I ♥ cake' ]

Therefore the input string will have to be escaped in the same way for regexgen to receive it properly:

$ regexgen 'I \u2665 cake'
/I \\u2665 cake/

$ regexgen $'I \u2665 cake'
/I \u2665 cake/

@pemistahl
Copy link
Author

Thanks for the explanation, @gilmoreorless. But this is not nice. The CLI should take care of handling the quoting and escaping rules in the different shells. Is this possible? If so, any chance to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants