-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Elm (elm-lang.org) support #1174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for this great PR! Please take a look at my review.
You might also want to add an example file: see https://github.com/PrismJS/prism/tree/gh-pages/examples
components/prism-elm.js
Outdated
// The imported or hidden names are not included in this import | ||
// statement. This is because we want to highlight those exactly like | ||
// we do for the names in the program. | ||
pattern: /(\r?\n|\r|^)\s*import\s+([A-Z][_a-zA-Z0-9]*)(\.[A-Z][_a-zA-Z0-9]*)*(\s+as\s+([A-Z][_a-zA-Z0-9]*)(\.[A-Z][_a-zA-Z0-9]*)*)?(\s+exposing\s+)?/m, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do you really need to capture the line-feed at the beginning? Isn't the
^
anchor enough with them
flag? - Occurrences of
[_a-zA-Z0-9]
should be simplified as\w
. - There is no need for the parentheses around
([A-Z][_a-zA-Z0-9]*)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
components/prism-elm.js
Outdated
pattern: /(^|[^-!#$%*+=?&@|~.:<>^\\\/])(--[^-!#$%*+=?&@|~.:<>^\\\/].*|{-[\s\S]*?-})/m, | ||
lookbehind: true | ||
}, | ||
char: /'([^\\']|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+))'/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite overkill to me. Remember that Prism is not a linter.
Is there any case where a single-quoted string of characters should not be highlighted as char
? If there isn't, you should be able to drastically simplify this regexp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you might want to add the greedy
flag, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, upon closer inspection, I think /'([^\\']|\\([abfnrtv']|\d+|x[0-9a-fA-F]+))'/
will do - any single character (except '
itself) or an escape sequence which comes in a few forms. Thanks!
components/prism-elm.js
Outdated
}, | ||
char: /'([^\\']|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+))'/, | ||
string: { | ||
pattern: /"([^\\"]|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+)|\\\s+\\)*"/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here: this looks overcomplicated. Are there string-like patterns in Elm that are not to be highlighted as string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this is way overcomplicated. I adapted it from the haskell one; and I'm now realizing that - other than being overcomplicated - it's also wrong. Elm doesn't do multiline strings that way, but rather with """
. Trying to figure out a good way to match either simple strings or multiline strings, which is slightly complicated by the latter being able to include single "
.
components/prism-elm.js
Outdated
// operator too. | ||
operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/, | ||
// In Elm, nearly everything is a variable, do not highlight these. | ||
hvariable: /\b([A-Z][_a-zA-Z0-9]*\.)*[_a-z][_a-zA-Z0-9]*\b/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Occurrences of [_a-zA-Z0-9]
should be replaced with \w
.
components/prism-elm.js
Outdated
operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/, | ||
// In Elm, nearly everything is a variable, do not highlight these. | ||
hvariable: /\b([A-Z][_a-zA-Z0-9]*\.)*[_a-z][_a-zA-Z0-9]*\b/, | ||
constant: /\b([A-Z][_a-zA-Z0-9]*\.)*[A-Z][_a-zA-Z0-9]*\b/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Occurrences of [_a-zA-Z0-9]
should be replaced with \w
.
components/prism-elm.js
Outdated
// It may also be a separator between a module name and an identifier => no | ||
// operator. If it comes together with other special characters it is an | ||
// operator too. | ||
operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Occurrences of
[_a-zA-Z0-9']
should be replaced with[\w']
. - I might be wrong, but couldn't the whole regexp be rewritten as this one?
(single dot | special char (including dot) appearing two or more times | special char alone (not including dot) | backtick stuff)
/\s\.\s|[-!#$%*+=?&@|~:<>^\\\/.]{2,}|[-!#$%*+=?&@|~:<>^\\\/]|`([A-Z][\w']*\.)*[_a-z][\w']*`/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'm gonna have a look at the characters the actual Elm parser allows, which should allow me to scrap a few. I included the backtick stuff for compatibility with Elm 0.17 but I think there's no real point to doing that, especially with 0.19 coming close to release.
I'll simplify, thanks :)
Thanks a lot for this PR @zwilias! Just pinging here as it'd be great if you could respond to @Golmote's review. Looking forward to see elm highlighting in markdown-preview-enhanced atom & vscode plugins, which use PrismJS under the hood! Perhaps, someone else from the elm community could help here? I'm very new to this language :–) |
@zwilias can I help with anything? |
Finally got around to cleaning up the patterns and fixing a few more bugs 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! I've added a few more comments.
Also, I'd prefer you only use capturing groups when they are needed ; that is, for the lookbehind
feature and back references. All other groups should be made non-capturing using ?:
(e.g. (?:foo)
instead of (foo)
).
components/prism-elm.js
Outdated
string: [ | ||
{ | ||
// Multiline strings are wrapped in triple ". Quotes may appear unescaped. | ||
pattern: /"""((.|\s)*?(?="""))"""/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an actual need for the positive look-ahead here?
Also, can you rewrite (.|\s)
as [\s\S]
? It's what we use everywhere else as a match-everything token.
@@ -7,6 +8,7 @@ | |||
|
|||
[ | |||
["char", "'a'"], | |||
["char", "'\\''"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a tab to fix the indentation of this line please?
|
||
---------------------------------------------------- | ||
|
||
[ | ||
["string", "\"\""], | ||
["string", "\"regular string\""], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please use a tab.
ping @zwilias 🙌 |
Nice! Thank you both for sticking around. Expect a new version release (1.10) during the week. |
* gh-pages: (326 commits) Add C++ platform-independent types (PrismJS#1271) Release 1.10.0 Unescaped markup plugin: Make it work with any language (PrismJS#1265) Previewers: New plugin combining previous plugins Previewer: Base, Previewer: Angle, Previewer: Color, Previewer: Easing, Previewer: Gradient and Previewer: Time. Fix PrismJS#913 (PrismJS#1244) Add Elm (elm-lang.org) support (PrismJS#1174) IchigoJam: Remove unneeded escape Run gulp add Io syntax (PrismJS#1251) package.json: add attribute `style` (PrismJS#1256) Add the C++11 raw string feature to the cpp language IchigoJam: Make strings greedy BASIC: Make strings greedy Run gulp Add support for IchigoJam BASIC (PrismJS#1246) Add support for 6502 assembly (PrismJS#1245) fix for autoloader plugin Run gulp and reorder components alphabetically Xeora Language implementation (PrismJS#1239) upgrade autoloader Release 1.9.0 ...
No description provided.