Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elm (elm-lang.org) support #1174

Merged
merged 5 commits into from
Jan 16, 2018
Merged

Add Elm (elm-lang.org) support #1174

merged 5 commits into from
Jan 16, 2018

Conversation

zwilias
Copy link
Contributor

@zwilias zwilias commented Aug 17, 2017

No description provided.

Copy link
Contributor

@Golmote Golmote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for this great PR! Please take a look at my review.

You might also want to add an example file: see https://github.com/PrismJS/prism/tree/gh-pages/examples

// The imported or hidden names are not included in this import
// statement. This is because we want to highlight those exactly like
// we do for the names in the program.
pattern: /(\r?\n|\r|^)\s*import\s+([A-Z][_a-zA-Z0-9]*)(\.[A-Z][_a-zA-Z0-9]*)*(\s+as\s+([A-Z][_a-zA-Z0-9]*)(\.[A-Z][_a-zA-Z0-9]*)*)?(\s+exposing\s+)?/m,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do you really need to capture the line-feed at the beginning? Isn't the ^ anchor enough with the m flag?
  • Occurrences of [_a-zA-Z0-9] should be simplified as \w.
  • There is no need for the parentheses around ([A-Z][_a-zA-Z0-9]*).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

pattern: /(^|[^-!#$%*+=?&@|~.:<>^\\\/])(--[^-!#$%*+=?&@|~.:<>^\\\/].*|{-[\s\S]*?-})/m,
lookbehind: true
},
char: /'([^\\']|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+))'/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite overkill to me. Remember that Prism is not a linter.
Is there any case where a single-quoted string of characters should not be highlighted as char? If there isn't, you should be able to drastically simplify this regexp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you might want to add the greedy flag, here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, upon closer inspection, I think /'([^\\']|\\([abfnrtv']|\d+|x[0-9a-fA-F]+))'/ will do - any single character (except ' itself) or an escape sequence which comes in a few forms. Thanks!

},
char: /'([^\\']|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+))'/,
string: {
pattern: /"([^\\"]|\\([abfnrtv\\"'&]|\^[A-Z@[\]\^_]|NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|SP|DEL|\d+|o[0-7]+|x[0-9a-fA-F]+)|\\\s+\\)*"/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here: this looks overcomplicated. Are there string-like patterns in Elm that are not to be highlighted as string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is way overcomplicated. I adapted it from the haskell one; and I'm now realizing that - other than being overcomplicated - it's also wrong. Elm doesn't do multiline strings that way, but rather with """. Trying to figure out a good way to match either simple strings or multiline strings, which is slightly complicated by the latter being able to include single ".

// operator too.
operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/,
// In Elm, nearly everything is a variable, do not highlight these.
hvariable: /\b([A-Z][_a-zA-Z0-9]*\.)*[_a-z][_a-zA-Z0-9]*\b/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occurrences of [_a-zA-Z0-9] should be replaced with \w.

operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/,
// In Elm, nearly everything is a variable, do not highlight these.
hvariable: /\b([A-Z][_a-zA-Z0-9]*\.)*[_a-z][_a-zA-Z0-9]*\b/,
constant: /\b([A-Z][_a-zA-Z0-9]*\.)*[A-Z][_a-zA-Z0-9]*\b/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occurrences of [_a-zA-Z0-9] should be replaced with \w.

// It may also be a separator between a module name and an identifier => no
// operator. If it comes together with other special characters it is an
// operator too.
operator: /\s\.\s|[-!#$%*+=?&@|~.:<>^\\\/]*\.[-!#$%*+=?&@|~.:<>^\\\/]+|[-!#$%*+=?&@|~.:<>^\\\/]+\.[-!#$%*+=?&@|~.:<>^\\\/]*|[-!#$%*+=?&@|~:<>^\\\/]+|`([A-Z][_a-zA-Z0-9']*\.)*[_a-z][_a-zA-Z0-9']*`/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Occurrences of [_a-zA-Z0-9'] should be replaced with [\w'].
  • I might be wrong, but couldn't the whole regexp be rewritten as this one?
    (single dot | special char (including dot) appearing two or more times | special char alone (not including dot) | backtick stuff)
    /\s\.\s|[-!#$%*+=?&@|~:<>^\\\/.]{2,}|[-!#$%*+=?&@|~:<>^\\\/]|`([A-Z][\w']*\.)*[_a-z][\w']*`/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'm gonna have a look at the characters the actual Elm parser allows, which should allow me to scrap a few. I included the backtick stuff for compatibility with Elm 0.17 but I think there's no real point to doing that, especially with 0.19 coming close to release.

I'll simplify, thanks :)

@kachkaev
Copy link

kachkaev commented Dec 14, 2017

Thanks a lot for this PR @zwilias! Just pinging here as it'd be great if you could respond to @Golmote's review. Looking forward to see elm highlighting in markdown-preview-enhanced atom & vscode plugins, which use PrismJS under the hood!

Perhaps, someone else from the elm community could help here? I'm very new to this language :–)

@zwilias
Copy link
Contributor Author

zwilias commented Dec 14, 2017

Put it on my todo list; I'll try to get back to this asap. Thanks for the reminder, @kachkaev, and so sorry for dropping the ball here, @Golmote. <3

@kachkaev
Copy link

kachkaev commented Dec 24, 2017

@zwilias can I help with anything?

@zwilias
Copy link
Contributor Author

zwilias commented Dec 24, 2017

Finally got around to cleaning up the patterns and fixing a few more bugs 😅

Copy link
Contributor

@Golmote Golmote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I've added a few more comments.
Also, I'd prefer you only use capturing groups when they are needed ; that is, for the lookbehind feature and back references. All other groups should be made non-capturing using ?: (e.g. (?:foo) instead of (foo)).

string: [
{
// Multiline strings are wrapped in triple ". Quotes may appear unescaped.
pattern: /"""((.|\s)*?(?="""))"""/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an actual need for the positive look-ahead here?
Also, can you rewrite (.|\s) as [\s\S]? It's what we use everywhere else as a match-everything token.

@@ -7,6 +8,7 @@

[
["char", "'a'"],
["char", "'\\''"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a tab to fix the indentation of this line please?


----------------------------------------------------

[
["string", "\"\""],
["string", "\"regular string\""],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please use a tab.

@kachkaev
Copy link

kachkaev commented Jan 8, 2018

@zwilias would you be happy to address @Golmote's comments? So few bits are left to amend! 🚀

@kachkaev
Copy link

ping @zwilias 🙌

@kachkaev
Copy link

Many thanks @zwilias! 👏 @Golmote do you think the PR can be now merged and produce a new PrismJS version?

@Golmote Golmote merged commit d6da70e into PrismJS:gh-pages Jan 16, 2018
@Golmote
Copy link
Contributor

Golmote commented Jan 16, 2018

Nice! Thank you both for sticking around. Expect a new version release (1.10) during the week.

kachkaev added a commit to kachkaev/mume that referenced this pull request Jan 17, 2018
@Golmote
Copy link
Contributor

Golmote commented Jan 17, 2018

@zwilias @kachkaev And 1.10.0 is out and published on NPM! 🎉

papandreou added a commit to papandreou/prism that referenced this pull request Jan 29, 2018
* gh-pages: (326 commits)
  Add C++ platform-independent types (PrismJS#1271)
  Release 1.10.0
  Unescaped markup plugin: Make it work with any language (PrismJS#1265)
  Previewers: New plugin combining previous plugins Previewer: Base, Previewer: Angle, Previewer: Color, Previewer: Easing, Previewer: Gradient and Previewer: Time. Fix PrismJS#913 (PrismJS#1244)
  Add Elm (elm-lang.org) support (PrismJS#1174)
  IchigoJam: Remove unneeded escape
  Run gulp
  add Io syntax (PrismJS#1251)
  package.json: add attribute `style` (PrismJS#1256)
  Add the C++11 raw string feature to the cpp language
  IchigoJam: Make strings greedy
  BASIC: Make strings greedy
  Run gulp
  Add support for IchigoJam BASIC (PrismJS#1246)
  Add support for 6502 assembly (PrismJS#1245)
  fix for autoloader plugin
  Run gulp and reorder components alphabetically
  Xeora Language implementation (PrismJS#1239)
  upgrade autoloader
  Release 1.9.0
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants