Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -fdollars-in-identifiers and -fno-dollars-in-identifiers option #152

Merged
merged 3 commits into from
Dec 4, 2021

Conversation

iddev5
Copy link
Contributor

@iddev5 iddev5 commented Dec 1, 2021

Also adds the related warnings -W(no-)dollar-in-identifier-extension

Closes #132

@iddev5
Copy link
Contributor Author

iddev5 commented Dec 1, 2021

This is actually complete but with -fno-dollars-in-identifier, the errors are quite different from clang. This is due to differences in parsing.

Taking the added test:
arocc:

./test.c:12:6: error: expected identifier or '('
./test.c:12:6: error: expected ';', found invalid bytes
./test.c:15:7: error: expected identifier or '('
./test.c:15:7: error: expected ';', found invalid bytes
./test.c:18:6: error: expected identifier or '('
./test.c:18:6: error: expected ';', found invalid bytes
./test.c:18:10: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
./test.c:18:18: warning: non-void function 'ther' does not return a value [-Wreturn-type]

clang:

test.c:12:6: error: variable has incomplete type 'void'
test.c:12:9: error: expected ';' after top level declarator

Here, it seems arocc keeps on parsing even after the $ char is found.

@iddev5 iddev5 marked this pull request as ready for review December 1, 2021 18:29
@Vexu
Copy link
Owner

Vexu commented Dec 1, 2021

Here, it seems arocc keeps on parsing even after the $ char is found.

You can stop that by returning error.ParsingFailed.

src/Parser.zig Outdated Show resolved Hide resolved
@ehaas
Copy link
Collaborator

ehaas commented Dec 1, 2021

If we wanted to have better error messaging than clang - an alternative approach could be to treat identifiers with $'s as extended identifiers. Then the dollar check could be isolated to checkIdentifierCodepoint, which means we wouldn't have to run it for every identifier, which is nice since they're probably pretty uncommon in real code. Finally we could have a separate diagnostic like illegal_dollar_in_identifier that prints exactly what is wrong - clang it treats it like any other random illegal character which might be confusing.

@iddev5
Copy link
Contributor Author

iddev5 commented Dec 2, 2021

Combining some parts from here and there, this is the new change:

  • any identifier containing '$' is now exclusively an extended_identifier
  • no matter whether dollars_in_identifiers is true or false, it is always tokenized as regular
  • eatIdentifier() extends the check to take into account the previous case and then emit an error ``illegal character '$' in identifier

Overall, i think the error messages are much better now and there are no false warnings

Also, the dollar check was not isolated to checkIdentifierCodepoint because the tokenizer cannot throw errors, something which we might need in future (I think #143 might be easier with it)

Copy link
Owner

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently ignores the preprocessor.

#define foo$ bar
foo$
// clang
// a.c:1:12: warning: ISO C99 requires whitespace after the macro name [-Wc99-extensions]
$ bar$
// Aro
bar

Splitting the token at the dollar sign fixes that but you'll need to add some extra logic in Parser to make the errors nice again.

diff --git a/src/Tokenizer.zig b/src/Tokenizer.zig
index e3502bc..eaae120 100644
--- a/src/Tokenizer.zig
+++ b/src/Tokenizer.zig
@@ -771,8 +771,7 @@ pub fn next(self: *Tokenizer) Token {
                 'u' => state = .u,
                 'U' => state = .U,
                 'L' => state = .L,
-                'a'...'t', 'v'...'z', 'A'...'K', 'M'...'T', 'V'...'Z', '_' => state = .identifier,
-                '$' => state = .extended_identifier,
+                'a'...'t', 'v'...'z', 'A'...'K', 'M'...'T', 'V'...'Z', '_', '$' => state = .identifier,
                 '=' => state = .equal,
                 '!' => state = .bang,
                 '|' => state = .pipe,
@@ -1055,7 +1054,10 @@ pub fn next(self: *Tokenizer) Token {
             },
             .identifier, .extended_identifier => switch (c) {
                 'a'...'z', 'A'...'Z', '_', '0'...'9' => {},
-                '$' => state = .extended_identifier,
+                '$' => {
+                    id = if (state == .identifier) Token.getTokenId(self.comp, self.buf[start..self.index]) else .extended_identifier;
+                    break;
+                },
                 else => {
                     if (c <= 0x7F or !Token.mayAppearInIdent(self.comp, c, .inside)) {
                         id = if (state == .identifier) Token.getTokenId(self.comp, self.buf[start..self.index]) else .extended_identifier;

const dollar_in_identifier_extension = struct {
const msg = "'$' in identifier";
const opt = "dollar-in-identifier-extension";
const kind = .warning;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const kind = .warning;
const kind = .off;

This is activated by -pedantic in clang.

@iddev5 iddev5 marked this pull request as draft December 3, 2021 12:46
@iddev5
Copy link
Contributor Author

iddev5 commented Dec 3, 2021

There are some more problems now, so marked as draft

@iddev5 iddev5 marked this pull request as ready for review December 4, 2021 13:14
@iddev5
Copy link
Contributor Author

iddev5 commented Dec 4, 2021

I think this is ready now, or at least the error messages can't get any better without any huge breaking changes

Explaining the recent commits:
As suggested by ethan, $ are exclusively part of extended identifiers again, and checked in Token.mayAppearInIdent once.
Moreover, in -fno-dollars-in-identifiers mode, the token immediately after the identifier is check if its a $ symbol, if so, then it is illegal and error must be shown (because in xyz$ for ex, xyz is an identifier followed immediately by a $ symbol)

Moreover, -Wdollar-in-identifier-extension has been set to off, preprocessor behavior matches gcc and clang.
I will squash it at the end, after review.

Copy link
Owner

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small nitpicks and this is done.

I duplicated some of the eatIdentifier logic in attribute to handle const and its variants so you'll have to modify it, sorry about that.

src/main.zig Outdated Show resolved Hide resolved
int $test;
}

void ano$ther() {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this case dollars in identifiers and add at least one test for dollar_in_identifier_extension by using #pragma GCC diagnostic warning "-Wdollar-in-identifier-extension".

In addition you could add a no dollars in identifiers case to test dollars_in_identifiers, for how to set the language option see how the //std= comment works in runner.zig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't added the no dollars in identifiers case because I think sooner or later, we would need more such options and thus a proper solution is needed, like parsing the whole (or atleast a larger) set of cli args from the first comment?

src/Tokenizer.zig Outdated Show resolved Hide resolved
src/Parser.zig Outdated Show resolved Hide resolved
src/Parser.zig Outdated Show resolved Hide resolved
src/Parser.zig Outdated Show resolved Hide resolved
- add -f(no-)dollars-in-identifiers option
- add warnings related to -W(no-)dollar-in-identifier-extension
- dollar $ symbol is now exclusively part of extended_identifier
- emit warning when identifier is followed by dollar symbol in
-fno-dollars-in-identifiers mode
@iddev5 iddev5 force-pushed the dollars-in-identifiers branch from a2a15b7 to 154500d Compare December 4, 2021 18:06
@iddev5
Copy link
Contributor Author

iddev5 commented Dec 4, 2021

Made all the necessary changes!

@Vexu Vexu merged commit 086d4bb into Vexu:master Dec 4, 2021
@Vexu
Copy link
Owner

Vexu commented Dec 4, 2021

Thanks, I made some final small tweaks and extracted that suggestion into an issue so that it's not forgotten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add -fdollars-in-identifiers / -fno-dollars-in-identifiers option
3 participants