Skip to content

Comments

Grammar fixes identified during attempt to convert it to tree-sitter#3019

Merged
Geod24 merged 32 commits intodlang:masterfrom
CyberShadow:grammar
Jun 30, 2021
Merged

Grammar fixes identified during attempt to convert it to tree-sitter#3019
Geod24 merged 32 commits intodlang:masterfrom
CyberShadow:grammar

Conversation

@CyberShadow
Copy link
Member

@CyberShadow CyberShadow commented Jun 11, 2021

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @CyberShadow!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@CyberShadow
Copy link
Member Author

OK, I think this is a good spot for a review/merge of everything so far.

The work above is what was needed to produce a working (albeit lightly tested) tree-sitter grammar. Most of these are fixes to DDoc syntax, grammar syntax, improvements to make the grammar more machine-readable and such. More fixes will follow in the future, but they will likely be of a different kind than most of the above (i.e. fixes in the actual grammar by clarifying ambiguous definitions or adding missing syntax variants).

CC @WalterBright @RazvanN7 @mdparker (looking at the git log I believe you are most involved in maintaining the spec lately).

@mdparker
Copy link
Member

I don't consider myself qualified to review the actual grammar changes, but the rest of it (syntax, English), LGTM. If @WalterBright doesn't check in, maybe @MoonlightSentinel or @Geod24 can?

spec/function.dd Outdated

$(GNAME FunctionLiteralBody):
$(GLINK SpecifiedFunctionBody)
$(GLINK BlockStatement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yields a dead link on the preview becauseBlockStatement is defined in statement.dd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note, this rule should probably be moved to expression.d because it's only used in the function literal section

spec/lex.dd Outdated

$(GRAMMAR
$(GNAME SourceFile):
$(GLINK ByteOrderMark) $(GLINK Module)$(OPT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(GLINK Module) yields a dead link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a leading byte-order mark really preclude a following shebang?
Otherwise this rule could be simplified to

ByteOrderMark (opt) Shebang (opt) Module (opt)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both the byte order mark and shebang want to be the very first bytes in the physical file. So, you can't have both.

spec/lex.dd Outdated
$(GLINK HexString)
$(GLINK DelimitedString)
$(GLINK TokenString)
$(GLINK TokenString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing brace should appear on the next line s.t. it's consistent with the other rules.

spec/lex.dd Outdated
Comment on lines 361 to 375
$(B q") $(GLINK Delimiter) $(GLINK WysiwygCharacters)$(OPT) $(GLINK MatchingDelimiter) $(B ")

$(GNAME Delimiter):
$(B $(LPAREN))
$(B {)
$(B [)
$(B <)
$(GLINK Identifier)

$(GNAME MatchingDelimiter):
$(B $(RPAREN))
$(B })
$(B ])
$(B >)
$(GLINK Identifier))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule could be modified to enforce the correct delimiters for almost all combinations except Identifier.

spec/class.dd Outdated
)

$(P $(I ClassInvariant) specify the relationships among the members of a class instance.
$(P Class $(I Invariant) specify the relationships among the members of a class instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$(P Class $(I Invariant) specify the relationships among the members of a class instance.
$(P Class $(I Invariant)s specify the relationships among the members of a class instance.

$(B Note): Class allocators are deprecated in D2.
$(GRAMMAR
$(GNAME Allocator):
$(D new) $(GLINK2 function, Parameters) $(D ;)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though class allocators are obsolete it seems that the parser does not complain about the following:

class A
{
    new(size_t size);    
    
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletion comes from this commit:

spec: Remove definitions redundant with MissingFunctionBody

$(D ;) is already a possible FunctionBody (as MissingFunctionBody).

$(GRAMMAR
$(GNAME Deallocator):
$(D delete) $(GLINK2 function, Parameters) $(D ;)
$(D delete) $(GLINK2 function, Parameters) $(GLINK2 function, FunctionBody)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems that delete declarations are no longer accepted by the parser.

class A
{
    delete(void *p) {}  
}

This fails with a parser error since 2.091.1.
Since class allocators/deallocators both end up with compilation errors, maybe it's just better if we
remove them altoghether from the spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific line deletion, please see above.

For the general topic of describing old language syntax, we can use GDEPRECATED.

There are some use cases to allow the grammar specify also previous versions of the language. If the grammar is used for anything other than develop a conforming implementation, such as code analysis tooling or (as is my case) syntax highlighting / editor support. Such tools have reasons to support previous versions of the language too.

We can use CSS or such to make removed / deprecated parts of the grammar clearly display as such (e.g. using 50% transparency or strikethrough).

@adamdruppe
Copy link
Contributor

adamdruppe commented Jun 24, 2021 via email

@adamdruppe
Copy link
Contributor

omg that message got wtf'd

but it is the "allocator relic" thing, with at disable new.

@CyberShadow
Copy link
Member Author

@MoonlightSentinel @RazvanN7 Thank you for the review. I updated the PR with these changes:

CyberShadow added a commit to CyberShadow/tree-sitter-d that referenced this pull request Jun 25, 2021
Rebase branch of the dlang.org repository which we're tracking in the
generator/dlang.org submodule to address pull request review comments
and for other cleanup.

Summary of changes:
dlang/dlang.org#3019 (comment)

Old submodule refs referenced by previous commits in this repository
will remain reachable via this branch:
https://github.com/CyberShadow/d-programming-language.org/commits/grammar-v1
CyberShadow added a commit to CyberShadow/tree-sitter-d that referenced this pull request Jun 25, 2021
@CyberShadow
Copy link
Member Author

Rebased to fix merge conflict with #3025.

Copy link
Member

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! A few things:

  • In commit "spec/lex.dd: Don't use $(D ...) for Unicode characters", there is the following message:

$(I ...) makes the backslash look weird, so use $(B ...) (freed up by the previous commit) instead.
I guess it's slightly outdate ?

  • "spec: Use $(D ...) instead of $(COMMA)": Why ? Simple curiosity here, as I'm the author of the code.
  • There's a lot of consistency fixes. Do you think you could amend/create a README or CONTRIBUTING so that all this knowledge is more prominently displayed / accessible ? Can be a new PR.
  • A previous version of this had parenthesis fixes. Was there bug reports opened for DMD's accept-invalid ?

Feel free to self merge when ready (I assume you might want to fix the commit message).

@CyberShadow
Copy link
Member Author

CyberShadow commented Jun 30, 2021

In commit "spec/lex.dd: Don't use $(D ...) for Unicode characters", there is the following message:

Well spotted, yes, it is outdated. I will remove it.

"spec: Use $(D ...) instead of $(COMMA)": Why ? Simple curiosity here, as I'm the author of the code.

$(COMMA) is necessary to insert a literal comma in cases where it would otherwise be interpreted as an argument separator for multi-argument macros. As neither GRAMMAR nor D are multi-argument macros (i.e. they use $0 and not $1, $2 etc.), it would be synonymous with just using a comma. Furthermore, we would still need to wrap it in $(D ...) to indicate that it is a token.

There's a lot of consistency fixes. Do you think you could amend/create a README or CONTRIBUTING so that all this knowledge is more prominently displayed / accessible ? Can be a new PR.

I think the less documentation contributors need to read, the better. We can check some things in CI, and I hope that by making everything consistent once, it becomes easier to remain consistent simply by following the existing conventions. Some things like $(B/D/I ...) distinctions are more subtle though, maybe separate macros could be used instead of those.

A previous version of this had parenthesis fixes. Was there bug reports opened for DMD's accept-invalid ?

I'm guessing you mean #3017 . I believe DMD does warn on unbalanced parentheses if you run it with both -D and -w, however, nobody looks at the warnings.

Unlike almost all other uses of $(D ...) in grammar blocks, there are
not verbatim D tokens, but *descriptions* of tokens (in the same way
that "any Unicode character" is).

$(I ...) makes the backslash look weird, so use $(B ...) for this
purpose instead.
Disambiguate $(I ...) from other meanings.
We can use $(D ...) for tokens which may be surrounded by whitespace,
and $(B ...) for character sequences which must be contiguous.
These "short-hands" complicate parsing, and probably shouldn't even
exist at all.
The grammar definition is at the start of the section, so there is
little change in functionality.
As opposed to linking to grammar definitions, which is what
GLINK/GLINK2 is mostly used for.

These are not part of the grammar, so use the syntax/formatting used
for verbatim D tokens instead.

As LINK2 / RELATIVE_LINK2 is already used for this purpose in the
spec, this further improves consistency (in addition to the primary
goal of improving machine readability).
Consistently indicate that these form a single token, with no
interleaved whitespace/comments.
Consistently indicate where whitespace/comments may appear between
tokens.
To be used for blocks describing sequences of characters (which are
contiguous), as opposed to sequences of tokens (which may have
whitespace / comments between them).
A few blocks needed to be split up into lexical (characters) and
regular (tokens) blocks.
…okens

As the q and { in q{ must be contiguous, the two characters here act
as a token.
These are all linking to the current page.
When the definition is not in the same file, we need to use GLINK2 and
point at the correct file.
@Geod24
Copy link
Member

Geod24 commented Jun 30, 2021

Bad bot!

@Geod24 Geod24 merged commit 6b03aa3 into dlang:master Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants