-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(swift) Revamped keywords and added support for operators #2908
Conversation
@joshgoebel I also have a new grammar for attributes ready (so we can get rid of the hardcoded list), but that depends on this PR. |
Hopefully, this will also allow me to fix #2895 eventually. |
I did detect an issue here: some keywords, such as The text So I have more work to do here. @joshgoebel How do you recommend I handle keywords with parentheses? The old grammar didn't match them and just happened to highlight keywords such as |
It would only do that if () are part of You probably do NOT want |
src/languages/swift.js
Outdated
const operatorHead = either( | ||
/[/=\-+!*%<>&|^~?]/, | ||
/[\u00A1–\u00A7]/, | ||
/[\u00A9\u00AB]/, | ||
/[\u00AC\u00AE]/, | ||
/[\u00B0\u00B1]/, | ||
/[\u00B6\u00BB\u00BF\u00D7\u00F7]/, | ||
/[\u2016–\u2017]/, | ||
/[\u2020–\u2027]/, | ||
/[\u2030–\u203E]/, | ||
/[\u2041–\u2053]/, | ||
/[\u2055–\u205E]/, | ||
/[\u2190–\u23FF]/, | ||
/[\u2500–\u2775]/, | ||
/[\u2794–\u2BFF]/, | ||
/[\u2E00–\u2E7F]/, | ||
/[\u3001–\u3003]/, | ||
/[\u3008–\u3020]/, | ||
/[\u3030]/ | ||
); | ||
const operatorCharacter = either( | ||
operatorHead, | ||
/[\u0300–\u036F]/, | ||
/[\u1DC0–\u1DFF]/, | ||
/[\u20D0–\u20FF]/, | ||
/[\uFE00–\uFE0F]/, | ||
/[\uFE20–\uFE2F]/, | ||
// TODO: The following characters are also allowed, but the regex doesn't work as intended. | ||
// For example, it also matches -10 as an operator. | ||
// /[\u{E0100}–\u{E01EF}]/u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. From what I'm reading this is the full range of what MIGHT be used for defining a custom operator... just because something could potentially be an operator doesn't mean it most assuredly IS an operator and should be highlighted everywhere as an operator... correct?
This "wideness" might make sense in the context of something like func [x] (a,b)
where [x] is an operator being defined (and we know that by context), but I don't think it makes sense in general. Pretty sure in the standard case we should only highlight well-known built-in operators that Swift supports, like >
, <
, etc...
Or do those UTF-8 ranges have NO other uses in Swift at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these characters are used elsewhere in the grammar, such as <GenericParam>
, or OptionalType?
, but other than that, they're always operators.
I assume the set of "head" character for operators and other identifiers is disjunct. For example, if you look at the range A1-FF, the ones that are missing from the list here are listed explicitly as valid head characters for regular (non-operator) identifiers:
\u00A8\u00AA\u00AD\u00AF\u00B2–\u00B5\u00B7–\u00BA
\u00BC–\u00BE\u00C0–\u00D6\u00D8–\u00F6\u00F8–\u00FF
So I think this should work just fine, as long as the cases where operators are used as punctuation are handled by specific modes. In the case of generics and optional types, that's already the case.
// /[\u{E0100}–\u{E01EF}]/u | ||
); | ||
const OPERATOR = { | ||
className: 'operator', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these operators or keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this question about as? as! try? try!
?
If so, the grammar specifies them as operators and keywords at the same time, so I'm highlighting them as keywords, which is what other grammars do too, although most don't bother highlighting the question or exlamation mark. I preferred to include that in the match, otherwise it would be matched as a postfix operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are keywords then you should remove "operator", it's not serving any purpose. We can't have them be both (not desirable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add them to list of "special" keywords (like the ones that have parentheses) instead, so I don't have to change the keywords $pattern to allow question marks or exclamation points.
My goal here was to group these keywords under the "operator" umbrella as well, in case a future grammar says "any operator is allowed here", which would then also include these keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal here was to group these keywords under the "operator" umbrella as well, in case a future grammar says "any operator is allowed here", which would then also include these keywords.
I think you may be over thinking that. :)
I was thinking the same thing. And also to match the
The first is a keyword, whereas the second is an attribute 🤷♂️ I can just assign them a different class ( Or should I use |
You should assign them the most semantically correct class. You can never 100% assume anything about color - that is 100% up to themes and sometimes themes defy all reasoning. |
# Conflicts: # CHANGES.md # src/languages/swift.js # test/markup/swift/multiline-string.expect.txt
@joshgoebel I'm running into a lot of issues with how keywords are handled in HLJS. For example:
I'm starting to think I should just stop using the |
@joshgoebel I've made a lot of changes here:
I've also changed the author to myself, as there's hardly any original code left, so I should be the one the contact/blame in case there's an issue with the grammar 😅 Is that OK? |
Yes, the "typical" solution here is to add a guard/rule for dispatch (to cover these cases), as I believe I've already mentioned several places. That is the direction the library is headed as well since we'd like to highlight dispatch in the future as a thing.
I have less advice here. This is sort of unusual as most grammars do not match just any identifier like this. What I'd do though is add a special rule for the exceptions (Any, Self)...
Generally this should be done as a last resort. All three of the things you mention here should not be difficult to work-around and is already common in other grammars. |
No objection. I'll need some time to review the current state of things though. |
Well, I did go for that last resort, as that lets me match all keywords without too many exception modes. However, I think removing the |
src/languages/lib/swift.js
Outdated
/frozen/, | ||
/GKInspectable/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they (in the context of the list) are literal strings, then prefer normal quoting not //
. This would apply to almost all these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I mix strings and regexes in the same array, will that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing isn't a problem. The underlying utility stuff and compiler usually handles strings and regex interchangeably without any issue. It's only an issue when doing something like a+b
or string interpolation, which mostly we're trying to get away from.
src/languages/swift.js
Outdated
// https://docs.swift.org/swift-book/ReferenceManual/zzSummaryOfTheGrammar.html | ||
const DOT_KEYWORD = { | ||
begin: concat(/\./, either(...Swift.dotKeywords, ...Swift.optionalDotKeywords)), | ||
returnBegin: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer regex.lookahead
to returnBegin whenever you can, we'd like to deprecate returnBegin in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even possible with a lookahead? I'm doing this workaround because lookbehind isn't supported. I want to highlight only the keyword, if it's preceded by a dot (which I don't want to highlight). The rule should just be:
begin: concat(/(?<=\.)/, either(...Swift.dotKeywords, ...Swift.optionalDotKeywords))
But that doesn't work yet.
Lookbehind will let me remove a lot of workarounds from the code, so I'm hoping it'll be supported soon. I also have some experience writing TextMate grammars, and lookbehind was quite essential there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookbehind will let me remove a lot of workarounds from the code, so I'm hoping it'll be supported soon.
Don't hold your breath. Even after Safari supports it we have to decide when we want to adopt it... right now we green field browsers going back 3-4 years... so if we waited that long it'll be a LONG time coming.
For a simple "pair" [".", something]
you don't even need this muck at all just use begin/end:
const DOT_KEYWORD = {
begin: concat(/\./, lookahead(either(...Swift.dotKeywords, ...Swift.optionalDotKeywords))),
end: either(...Swift.dotKeywords, ...Swift.optionalDotKeywords),
skipBegin: true
If you want you could rebase onto #2834 and just use beforeMatch
sugar (I think that's what I named it?). It's implemented differently still, but it does the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (in the next commit). But what about this one:
const AVAILABLE = {
begin: /(@|#)available/,
returnBegin: true,
contains: [
{
className: 'keyword',
begin: /(@|#)available/
},
{
begin: /\(/,
end: /\)/,
endsParent: true,
keywords: Swift.availabilityKeywords.join(' '),
contains: [
OPERATOR,
NUMBER,
STRING
]
}
]
};
Here, I only want to highlight the keyword itself, but the match should also consume the argument list.
The rule still seems to work when I replace begin/returnBegin with begin: lookahead(/(@|#)available/)
, but I wonder if that's kosher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule still seems to work when I replace
Cause it's pretty much just two ways of doing the same thing. Just one rewinds the cursor and one avoids moving it forward.
You could collapse this to a single rule I think, couldn't you?
const AVAILABLE = {
begin: /(@|#)available\(/,
keywords: "@available #available"
// ...
After setting some relevances to 0, the language detection issues seem resolved. All tests are back to green now :) |
src/languages/lib/swift.js
Outdated
@@ -0,0 +1,301 @@ | |||
/* eslint-disable no-misleading-character-class */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add these, if there are issues remaining we need to resolve them.
src/languages/lib/swift.js
Outdated
export const identifierCharacter = either( | ||
identifierHead, | ||
/\d/, | ||
/[\u0300-\u036F\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm showing \uFE20-\uFE2F
(the whole range) as problematic because it's a combining character. Unless you want to add a specific test that proves this works as it should we should remove it/comment it out. (I'm trusting that the linter is right on this and this is broken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this works and is valid Swift code:
infix operator ~︦ // ~ 0xFE26
func ~︦ (lhs: Int, rhs: Int) -> Int {
lhs + rhs
}
1 ~︦ 2
Should I add this as a unit test?
I assume this isn't a problem here because there are different rules for the first character of an operator. I'm not familiar with these ranges, but I assume the rules for operatorHead
don't include combining characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't hurt. Do you have any idea what that warning is trying to tell us? Can the character also be used to make a compound character and then that doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I assume the rules for operatorHead don't include combining characters?
This one regex is the only one still being flagged and just that portion of it. So it's unique somehow.
src/languages/swift.js
Outdated
begin: Swift.operator | ||
}, | ||
{ | ||
begin: `\\.(\\.|${Swift.operatorCharacter})+` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are both these cases accomplishing? I'm not sure I understand the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first case is a regular operator: a head
followed by zero or more valid characters.
The second is a dot-operator: only operators that start with a dot are allowed to use dots as characters (...
, ...<
, .*
, etc). So there rule here is: a dot followed by one or more characters that may also include dots.
Indeed, looking pretty good overall I think. I plan to spend a little time on this myself and see if the keywords can be simplified any. |
I cleaned this up a little:
All tests are still green. Any questions before we merge? |
I just wonder whether I'll run into issues again with the regular keyword system, as keywords (such as |
That's why Any and Self are still separate rules (to prevent TYPE from grabbing them). The keyword system should be preferred when it works, and then use modes to fill in the gaps. We'd like to extend the keyword system long-term where it makes sense not have every grammar build their own custom system. So forcing you to get used to working with it is part of the long-term plan. :-) There are a few issues open on keywords if you wanted to have a look and perhaps comment. That said, cases like TYPE are probably not going away anytime soon because keywords are processed "in the gaps" and probably always will be (that's their distinction from modes). So the solution is just to handle Any and Self with their own rule that fires before type. Long-term multi-patterns and look-behind would resolve a lot of your other keyword issues I think. Though you can really already do multi-patterns using |
@svanimpe Thanks for all the effort on this! |
@joshgoebel I ran into a problem with the new keyword regex: I'm trying to match tuples like I still feel like matching keywords last is only going to keep producing problems like this, which is why I moved them all into modes 😕 To solve my current problem, I'd have to revert some of your changes and change keywords with parentheses back to regexes, so they're included in the KEYWORD_MODES. Is that OK for you? |
Please push the code that isn't working with a failing test and let me look at it. Often there are easy work-arounds for many of these kinds of problems. Off the top I wouldn't expect this to be hard to do. Isn't a rule to match |
I'm working on some significant improvements to the grammar, and this felt like something I could submit already.
\uffff
.