-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode operator support missing from PureScript #3006
Comments
Hmmm, this will be difficult. I don't think that we will be able to match the behaviour of Don't get me wrong, it's easy to create a regex that matches /[:!#$%&*+./<=>?@\\^|\-~]|(?![\0-\x7F])[\p{gc=Math_Symbol}\p{gc=Currency_Symbol}\p{Modifier_Symbol}\p{Other_Symbol}]/u But we can't use this regex. We support browsers that do not support Unicode property escapes or the /[:!#$%&*+./<=>?@\\^|\-~\xa2-\xa6\xa8\xa9\xac\xae-\xb1\xb4\xb8\xd7\xf7\u02c2-\u02c5\u02d2-\u02df\u02e5-\u02eb\u02ed\u02ef-\u02ff\u0375\u0384\u0385\u03f6\u0482\u058d-\u058f\u0606-\u0608\u060b\u060e\u060f\u06de\u06e9\u06fd\u06fe\u07f6\u07fe\u07ff\u09f2\u09f3\u09fa\u09fb\u0af1\u0b70\u0bf3-\u0bfa\u0c7f\u0d4f\u0d79\u0e3f\u0f01-\u0f03\u0f13\u0f15-\u0f17\u0f1a-\u0f1f\u0f34\u0f36\u0f38\u0fbe-\u0fc5\u0fc7-\u0fcc\u0fce\u0fcf\u0fd5-\u0fd8\u109e\u109f\u1390-\u1399\u166d\u17db\u1940\u19de-\u19ff\u1b61-\u1b6a\u1b74-\u1b7c\u1fbd\u1fbf-\u1fc1\u1fcd-\u1fcf\u1fdd-\u1fdf\u1fed-\u1fef\u1ffd\u1ffe\u2044\u2052\u207a-\u207c\u208a-\u208c\u20a0-\u20bf\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116-\u2118\u211e-\u2123\u2125\u2127\u2129\u212e\u213a\u213b\u2140-\u2144\u214a-\u214d\u214f\u218a\u218b\u2190-\u2307\u230c-\u2328\u232b-\u2426\u2440-\u244a\u249c-\u24e9\u2500-\u2767\u2794-\u27c4\u27c7-\u27e5\u27f0-\u2982\u2999-\u29d7\u29dc-\u29fb\u29fe-\u2b73\u2b76-\u2b95\u2b97-\u2bff\u2ce5-\u2cea\u2e50\u2e51\u2e80-\u2e99\u2e9b-\u2ef3\u2f00-\u2fd5\u2ff0-\u2ffb\u3004\u3012\u3013\u3020\u3036\u3037\u303e\u303f\u309b\u309c\u3190\u3191\u3196-\u319f\u31c0-\u31e3\u3200-\u321e\u322a-\u3247\u3250\u3260-\u327f\u328a-\u32b0\u32c0-\u33ff\u4dc0-\u4dff\ua490-\ua4c6\ua700-\ua716\ua720\ua721\ua789\ua78a\ua828-\ua82b\ua836-\ua839\uaa77-\uaa79\uab5b\uab6a\uab6b\ufb29\ufbb2-\ufbc1\ufdfc\ufdfd\ufe62\ufe64-\ufe66\ufe69\uff04\uff0b\uff1c-\uff1e\uff3e\uff40\uff5c\uff5e\uffe0-\uffe6\uffe8-\uffee\ufffc\ufffd\u{10137}-\u{1013f}\u{10179}-\u{10189}\u{1018c}-\u{1018e}\u{10190}-\u{1019c}\u{101a0}\u{101d0}-\u{101fc}\u{10877}\u{10878}\u{10ac8}\u{1173f}\u{11fd5}-\u{11ff1}\u{16b3c}-\u{16b3f}\u{16b45}\u{1bc9c}\u{1d000}-\u{1d0f5}\u{1d100}-\u{1d126}\u{1d129}-\u{1d164}\u{1d16a}-\u{1d16c}\u{1d183}\u{1d184}\u{1d18c}-\u{1d1a9}\u{1d1ae}-\u{1d1e8}\u{1d200}-\u{1d241}\u{1d245}\u{1d300}-\u{1d356}\u{1d6c1}\u{1d6db}\u{1d6fb}\u{1d715}\u{1d735}\u{1d74f}\u{1d76f}\u{1d789}\u{1d7a9}\u{1d7c3}\u{1d800}-\u{1d9ff}\u{1da37}-\u{1da3a}\u{1da6d}-\u{1da74}\u{1da76}-\u{1da83}\u{1da85}\u{1da86}\u{1e14f}\u{1e2ff}\u{1ecac}\u{1ecb0}\u{1ed2e}\u{1eef0}\u{1eef1}\u{1f000}-\u{1f02b}\u{1f030}-\u{1f093}\u{1f0a0}-\u{1f0ae}\u{1f0b1}-\u{1f0bf}\u{1f0c1}-\u{1f0cf}\u{1f0d1}-\u{1f0f5}\u{1f10d}-\u{1f1ad}\u{1f1e6}-\u{1f202}\u{1f210}-\u{1f23b}\u{1f240}-\u{1f248}\u{1f250}\u{1f251}\u{1f260}-\u{1f265}\u{1f300}-\u{1f6d7}\u{1f6e0}-\u{1f6ec}\u{1f6f0}-\u{1f6fc}\u{1f700}-\u{1f773}\u{1f780}-\u{1f7d8}\u{1f7e0}-\u{1f7eb}\u{1f800}-\u{1f80b}\u{1f810}-\u{1f847}\u{1f850}-\u{1f859}\u{1f860}-\u{1f887}\u{1f890}-\u{1f8ad}\u{1f8b0}\u{1f8b1}\u{1f900}-\u{1f978}\u{1f97a}-\u{1f9cb}\u{1f9cd}-\u{1fa53}\u{1fa60}-\u{1fa6d}\u{1fa70}-\u{1fa74}\u{1fa78}-\u{1fa7a}\u{1fa80}-\u{1fa86}\u{1fa90}-\u{1faa8}\u{1fab0}-\u{1fab6}\u{1fac0}-\u{1fac2}\u{1fad0}-\u{1fad6}\u{1fb00}-\u{1fb92}\u{1fb94}-\u{1fbca}^]/iu And getting rid of the That's too long. Prism language definitions are supposed to be lightweight. Could we somehow limit the symbol used? Are there symbols that are commonly used?
Could you please give an example of this problem? |
Literally just quoting: {
// ...
//
// Most of this is needed because of the meaning of a single '.'.
// If it stands alone freely, it is the function composition.
// It may also be a separator between a module name and an identifier => no
// operator. If it comes together with other special characters it is an
// operator too.
'operator': /\s\.\s|[-!#$%*+=?&@|~:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`(?:[A-Z][\w']*\.)*[_a-z][\w']*`/,
//
// ...
} I get what this means, but I can't immediately follow the RegExp itself.
I suppose a symbol expression could either be something everyone using the library could use? Maybe perhaps there's some common Unicode ranges that could suffice. I don't know what a long-term solution is though outside of fully supporting all possibilities to get good support. I put in some realish examples in my shared example from code I'm using and it definitely looks unsupported. Looking for an actionable solutions with limitations:
Personally, the Unicode support was one of the reasons that's kept me attracted to PureScript after many years. |
lgtm 👍 |
Information
Description
While I did open a merge request for
∀
for the lowest hanging fruit as a keyword, there are issues with PureScript and unicode operators.∷→←⇒⇐
are natively supported as operators in the compiler, but also users can define their own operators (such as≡
for==
) (see: Lexer.hs). Anything that is considered as a "symbol" according tois 100% valid PureScript. As PureScript extends the Haskell, syntax and Haskell has the mention of
.
being an issue for function composition in its comment (with.
being used for records in PureScript), how best to modify the regex was not immediately apparent to me.Code snippet
Test page
The code being highlighted incorrectly.
The text was updated successfully, but these errors were encountered: