Grammar fixes identified during attempt to convert it to tree-sitter#3019
Grammar fixes identified during attempt to convert it to tree-sitter#3019Geod24 merged 32 commits intodlang:masterfrom
Conversation
|
Thanks for your pull request, @CyberShadow! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
f97b664 to
2473628
Compare
|
OK, I think this is a good spot for a review/merge of everything so far. The work above is what was needed to produce a working (albeit lightly tested) tree-sitter grammar. Most of these are fixes to DDoc syntax, grammar syntax, improvements to make the grammar more machine-readable and such. More fixes will follow in the future, but they will likely be of a different kind than most of the above (i.e. fixes in the actual grammar by clarifying ambiguous definitions or adding missing syntax variants). CC @WalterBright @RazvanN7 @mdparker (looking at the git log I believe you are most involved in maintaining the spec lately). |
|
I don't consider myself qualified to review the actual grammar changes, but the rest of it (syntax, English), LGTM. If @WalterBright doesn't check in, maybe @MoonlightSentinel or @Geod24 can? |
spec/function.dd
Outdated
|
|
||
| $(GNAME FunctionLiteralBody): | ||
| $(GLINK SpecifiedFunctionBody) | ||
| $(GLINK BlockStatement) |
There was a problem hiding this comment.
This yields a dead link on the preview becauseBlockStatement is defined in statement.dd
There was a problem hiding this comment.
On a side note, this rule should probably be moved to expression.d because it's only used in the function literal section
spec/lex.dd
Outdated
|
|
||
| $(GRAMMAR | ||
| $(GNAME SourceFile): | ||
| $(GLINK ByteOrderMark) $(GLINK Module)$(OPT) |
There was a problem hiding this comment.
$(GLINK Module) yields a dead link.
There was a problem hiding this comment.
Does a leading byte-order mark really preclude a following shebang?
Otherwise this rule could be simplified to
ByteOrderMark (opt) Shebang (opt) Module (opt)
There was a problem hiding this comment.
Yes, both the byte order mark and shebang want to be the very first bytes in the physical file. So, you can't have both.
spec/lex.dd
Outdated
| $(GLINK HexString) | ||
| $(GLINK DelimitedString) | ||
| $(GLINK TokenString) | ||
| $(GLINK TokenString)) |
There was a problem hiding this comment.
The closing brace should appear on the next line s.t. it's consistent with the other rules.
spec/lex.dd
Outdated
| $(B q") $(GLINK Delimiter) $(GLINK WysiwygCharacters)$(OPT) $(GLINK MatchingDelimiter) $(B ") | ||
|
|
||
| $(GNAME Delimiter): | ||
| $(B $(LPAREN)) | ||
| $(B {) | ||
| $(B [) | ||
| $(B <) | ||
| $(GLINK Identifier) | ||
|
|
||
| $(GNAME MatchingDelimiter): | ||
| $(B $(RPAREN)) | ||
| $(B }) | ||
| $(B ]) | ||
| $(B >) | ||
| $(GLINK Identifier)) |
There was a problem hiding this comment.
This rule could be modified to enforce the correct delimiters for almost all combinations except Identifier.
spec/class.dd
Outdated
| ) | ||
|
|
||
| $(P $(I ClassInvariant) specify the relationships among the members of a class instance. | ||
| $(P Class $(I Invariant) specify the relationships among the members of a class instance. |
There was a problem hiding this comment.
| $(P Class $(I Invariant) specify the relationships among the members of a class instance. | |
| $(P Class $(I Invariant)s specify the relationships among the members of a class instance. |
| $(B Note): Class allocators are deprecated in D2. | ||
| $(GRAMMAR | ||
| $(GNAME Allocator): | ||
| $(D new) $(GLINK2 function, Parameters) $(D ;) |
There was a problem hiding this comment.
Even though class allocators are obsolete it seems that the parser does not complain about the following:
class A
{
new(size_t size);
}There was a problem hiding this comment.
This deletion comes from this commit:
spec: Remove definitions redundant with MissingFunctionBody
$(D ;) is already a possible FunctionBody (as MissingFunctionBody).
| $(GRAMMAR | ||
| $(GNAME Deallocator): | ||
| $(D delete) $(GLINK2 function, Parameters) $(D ;) | ||
| $(D delete) $(GLINK2 function, Parameters) $(GLINK2 function, FunctionBody) |
There was a problem hiding this comment.
Hm, it seems that delete declarations are no longer accepted by the parser.
class A
{
delete(void *p) {}
}This fails with a parser error since 2.091.1.
Since class allocators/deallocators both end up with compilation errors, maybe it's just better if we
remove them altoghether from the spec?
There was a problem hiding this comment.
For this specific line deletion, please see above.
For the general topic of describing old language syntax, we can use GDEPRECATED.
There are some use cases to allow the grammar specify also previous versions of the language. If the grammar is used for anything other than develop a conforming implementation, such as code analysis tooling or (as is my case) syntax highlighting / editor support. Such tools have reasons to support previous versions of the language too.
We can use CSS or such to make removed / deprecated parts of the grammar clearly display as such (e.g. using 50% transparency or strikethrough).
|
On Thu, Jun 24, 2021 at 03:40:16AM -0700, Razvan Nitu wrote:
> @@ -1009,7 +1006,6 @@ $(H2 $(LNAME2 allocators, Class Allocators))
$(B Note): Class allocators are deprecated in D2.
$(GRAMMAR
$(GNAME Allocator):
- $(D new) $(GLINK2 function, Parameters) $(D ;)
Even though class allocators are obsolete it seems that the parser does not complain about the following:
note that it is specifically still allowed to do ***@***.*** new(...);`
|
|
omg that message got wtf'd but it is the "allocator relic" thing, with at disable new. |
|
@MoonlightSentinel @RazvanN7 Thank you for the review. I updated the PR with these changes:
|
Rebase branch of the dlang.org repository which we're tracking in the generator/dlang.org submodule to address pull request review comments and for other cleanup. Summary of changes: dlang/dlang.org#3019 (comment) Old submodule refs referenced by previous commits in this repository will remain reachable via this branch: https://github.com/CyberShadow/d-programming-language.org/commits/grammar-v1
In all respects synonymous to GRAMMAR. To be used for grammar blocks which are not part of the formal definition of D's grammar.
Allow denoting grammar elements as such without ad-lib text annotations.
|
Rebased to fix merge conflict with #3025. |
Geod24
left a comment
There was a problem hiding this comment.
Very nice! A few things:
- In commit "spec/lex.dd: Don't use $(D ...) for Unicode characters", there is the following message:
$(I ...) makes the backslash look weird, so use $ (B ...) (freed up by the previous commit) instead.
I guess it's slightly outdate ?
- "spec: Use
$(D ...) instead of $ (COMMA)": Why ? Simple curiosity here, as I'm the author of the code. - There's a lot of consistency fixes. Do you think you could amend/create a README or CONTRIBUTING so that all this knowledge is more prominently displayed / accessible ? Can be a new PR.
- A previous version of this had parenthesis fixes. Was there bug reports opened for DMD's accept-invalid ?
Feel free to self merge when ready (I assume you might want to fix the commit message).
Well spotted, yes, it is outdated. I will remove it.
I think the less documentation contributors need to read, the better. We can check some things in CI, and I hope that by making everything consistent once, it becomes easier to remain consistent simply by following the existing conventions. Some things like
I'm guessing you mean #3017 . I believe DMD does warn on unbalanced parentheses if you run it with both |
Unlike almost all other uses of $(D ...) in grammar blocks, there are not verbatim D tokens, but *descriptions* of tokens (in the same way that "any Unicode character" is). $(I ...) makes the backslash look weird, so use $(B ...) for this purpose instead.
Disambiguate $(I ...) from other meanings.
We can use $(D ...) for tokens which may be surrounded by whitespace, and $(B ...) for character sequences which must be contiguous.
These "short-hands" complicate parsing, and probably shouldn't even exist at all.
The grammar definition is at the start of the section, so there is little change in functionality.
Be consistent about this.
As opposed to linking to grammar definitions, which is what GLINK/GLINK2 is mostly used for. These are not part of the grammar, so use the syntax/formatting used for verbatim D tokens instead. As LINK2 / RELATIVE_LINK2 is already used for this purpose in the spec, this further improves consistency (in addition to the primary goal of improving machine readability).
Consistently indicate that these form a single token, with no interleaved whitespace/comments.
Consistently indicate where whitespace/comments may appear between tokens.
To be used for blocks describing sequences of characters (which are contiguous), as opposed to sequences of tokens (which may have whitespace / comments between them).
A few blocks needed to be split up into lexical (characters) and regular (tokens) blocks.
…okens
As the q and { in q{ must be contiguous, the two characters here act
as a token.
These are all linking to the current page.
When the definition is not in the same file, we need to use GLINK2 and point at the correct file.
|
Bad bot! |
The conversion program: https://github.com/CyberShadow/tree-sitter-d/tree/generated/generator
TODO: