Close #356 - Add whitespace token for preprocessor #437

hdamron17 · 2020-05-15T18:21:04Z

When complete, this will close #356 and maybe #395.

So far, the whitespace token has been added and properly outputs when using -E, but I still need to consume the whitespace before it's used by the parser.

jyn514 · 2020-05-15T18:25:50Z

I'd be interested if it helped with #361 as well.

hdamron17 · 2020-05-15T20:05:46Z

Observation: Need to add whitespace consideration back to preprocessor (e.g. #ifndef x fails because of the whitespace token in the middle) but ensure whitespace tokens have no newlines. Unit tests also fail a lot because the comparisons do not take whitespace into account.

- Don't set `seen_line_token` for whitespace - Ignore whitespace in `#if` expressions (since the parser doesn't know what whitespace is) - Run `cargo fmt`

src/lex/mod.rs

src/main.rs

jyn514 · 2020-05-20T02:18:57Z

Note this will not fix #395. The \n in the issue title refers to the actual characters \ and then n, not a newline.

and update tests to reflect non-whitespace

TODO should parse_all remove leading newlines?

hdamron17 · 2020-05-23T07:02:47Z

It's working now, but I probably should put some whitespace-dependent test cases for -E.

One possible issue that came up is that the parser iteration does not work if there is leading whitespace. However, it works in the final product so I don't think it matters. I changed the test case to have no leading whitespace in 8e0509e.

hdamron17 · 2020-05-23T07:15:23Z

Yeah it doesn't keep whitespace properly, mostly with preprocessor stuff since I was just trying to get existing tests to pass...

jyn514 · 2020-05-23T13:46:55Z

One possible issue that came up is that the parser iteration does not work if there is leading whitespace. However, it works in the final product so I don't think it matters. I changed the test case to have no leading whitespace in 8e0509e.

If I understand right, this means that it works correctly using check_semantics but that passing a whitespace token to Parser as first will give a spurious error? That seems fine as long as it's documented.

jyn514 · 2020-05-24T01:44:57Z

src/lex/cpp.rs

+                        vec![x, Ok(Token::Whitespace(String::from(" ")))]
+                    }
+                })
+                .flatten()


Why do you add these spaces here in the define? Does tokens_until_newline not return whitespace tokens?

tokens_until_newline (at the moment) does not include whitespace tokens. I did not want to change it since it is somewhat separate from the preprocessor. I can try adding whitespace tokens to it and see what happens now that it is in a stable state. I did notice that clang only puts one space when replacing preprocessor defines, regardless of the original spacing.

Hmm interesting ... I suppose since the behavior is correct we can try to go back and improve the spacing later.

Also, please add a comment to this effect either here or at tokens_until_newline.

I actually went back and did it the proper way by changing tokens_until_newline. Unfortunately I had to rework some stuff for boolean_expr because whitespace show up in the replacement stage. I'll push as soon as I double check the tests. Also, I think the rework will be much less of an eyesore.

jyn514

Very minor nits, overall this looks great :) I would like to see tests for a/* */b and maybe a few other things though.

src/lex/mod.rs

TODO add tests involving preprocessor stuff

hdamron17 · 2020-05-24T04:45:59Z

Some more tests with preprocessor stuff would be nice, but other than that, I think it's complete.

hdamron17 · 2020-05-24T05:01:01Z

Also, newlines are not preserved for preprocessor macros at the moment, so I'll need to fix that.

jyn514 · 2020-05-24T13:03:17Z

Also, newlines are not preserved for preprocessor macros at the moment, so I'll need to fix that.

Do you mean that \\\n isn't preserved?

#define f(a) { \
	int b = a; \
	return b; \
}
f(1)
 { int b = 1; return b; }

If so, that's fine, clang does the same. It will also be hard to do without major changes since deleting \\\n happens very early in the lexer: https://github.com/jyn514/rcc/blob/93b5e06/src/lex/mod.rs#L98

src/lex/cpp.rs

jyn514 · 2020-05-24T13:05:53Z

src/lex/cpp.rs

+    fn preprocess_only() {
+        assert_same_exact("int \t\n\r     main() {}", "int \t\n\r     main() {}");
+        assert_same_exact("int/* */main() {}", "int main() {}");
+        assert_same_exact("int/*\n\n\n*/main() {}", "int\n\n\nmain() {}");


Hmm, this behavior looks a little confusing. Is there a reason you kept newlines inside of block comments?

This seems to be the behavior of clang. For example,

int main() { /* */ }

preprocesses to

# 1 "test.c" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 363 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "test.c" 2 int main() { }

I may need to look at the documentation though since

int main() {/* */}

mysteriously preprocesses to

# 1 "test.c" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 363 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "test.c" 2 int main() { }

Relevant part of the standard (5.1.1.2 Translation phases):

The source file is decomposed into preprocessing tokens7) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

This seems to me to say that clang's behavior in your first example is correct, and the behavior in the second is a bug. For context, gcc always behaves like clang in the second example, i.e. it always deletes newlines in comments:

$ gcc -x c -E -P - int main() { /* */ }

preprocesses to

int main() { }

and

int main() {/* */}

preprocesses to int main() { }.

tcc behaves the same as gcc.

…rse_all`

hdamron17 · 2020-05-25T01:13:17Z

One possible issue that came up is that the parser iteration does not work if there is leading whitespace. However, it works in the final product so I don't think it matters. I changed the test case to have no leading whitespace in 8e0509e.

If I understand right, this means that it works correctly using check_semantics but that passing a whitespace token to Parser as first will give a spurious error? That seems fine as long as it's documented.

Turns out it was an easy fix. I just changed the parser function to use next_non_whitespace.

hdamron17 · 2020-05-25T01:25:01Z

Also, newlines are not preserved for preprocessor macros at the moment, so I'll need to fix that.

Do you mean that \\\n isn't preserved?

No, I mean #defines do not keep newlines after.

int main() {
#define a
#define b
#define c
}

preprocesses to

int main() {
}

but should preprocess to

# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 363 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
int main() {



}

hdamron17 · 2020-05-25T01:33:39Z

The newlines not being preserved after some macros is because tokens_until_newline does not actually stop at the newline but just consumes arbitrary whitespace until the line number increases. For example,

int main() {
#define a 1
           return a;
}

preprocesses to

int main() {
return 1;
}

jyn514 · 2020-05-25T01:55:37Z

tokens_until_newline does not actually stop at the newline but just consumes arbitrary whitespace until the line number increases

I don't see a way to fix this without fully separating the preprocessor from the lexer ... I guess I could make that function part of the lexer instead and stop at \n? That would require a fair bit of refactoring but not an unreasonable amount.

jyn514 · 2020-05-25T01:56:30Z

In any case I think it can be fixed later, the majority of things are working now.

jyn514 · 2020-05-25T02:41:42Z

r=me once @hdamron17 is happy with it

…espace improvements for preprocessor macros

hdamron17 · 2020-05-25T02:59:01Z

I went ahead and changed tokens_until_newline and it helps with #defines. All that's missing now are #else and #endif I think so I'll try to get those working and make enough tests to be worthwhile.

src/lex/cpp.rs

jyn514

Summarizing the changes to make sure I understand:

Code changes

Add a whitespace token
Fix up pretty printing, etc.
Return whitespace tokens from consume_whitespace, etc.
Add consume_whitespace_oneline, which only consumes whitespace on the current line. Previously, the lexer would follow whitespace as long as it saw it, even if it went across many lines. This fixes [ICE] the lexer and the preprocessor have trouble getting along #394.
consume_whitespace_oneline works by returning an error if the whitespace had a newline. This is an error since consume_whitespace_oneline is only called for preprocessor directives, which must always be on the same line. This is really clever, I don't know if I would have thought of it :)
Change tests to filter out whitespace by default; add some tests that don't ignore whitespace and make sure that bit works properly

Behavior

Keep all newlines within comments
Replace comments with a single space
Print whitespace when passed -E

Let me know if I missed anything :)

src/lex/cpp.rs

hdamron17 · 2020-05-25T04:21:36Z

r @jyn514

hdamron17 · 2020-05-25T04:30:48Z

Your summary sounds about right. Also the behaviour of consume_whitespace_oneline was just copied from your code so I guess you're the clever one. :)

Added whitespace token and made -E keep spacing, TODO consume whitespace

295337d

Filtered whitespace before parser, TODO debug a lot of test cases

1514108

Fix some bugs in whitespace handling

33f9f12

- Don't set `seen_line_token` for whitespace - Ignore whitespace in `#if` expressions (since the parser doesn't know what whitespace is) - Run `cargo fmt`

jyn514 reviewed May 15, 2020

View reviewed changes

src/lex/mod.rs Show resolved Hide resolved

src/main.rs Outdated Show resolved Hide resolved

hdamron17 added 7 commits May 22, 2020 23:46

Merge branch 'master' into preprocessor-whitespace

dd628bf

Add a oneline whitespace consumtion after #ifdef, #ifndef, #undef

30be8d9

and update tests to reflect non-whitespace

Consume whitespace between function macro args

60db031

Fixed most of lex::tests::*

30eac7f

Handle lex::tests::test_no_newline

f0c146e

Changed analyze::test::lol to not have leading newline

8e0509e

TODO should parse_all remove leading newlines?

Added whitespace between hash and directive

01f040b

hdamron17 changed the title ~~[WIP] Close #356 - Add whitespace token for preprocessor~~ Close #356 - Add whitespace token for preprocessor May 23, 2020

Remove trailing newline for -E

d51af7e

jyn514 mentioned this pull request May 23, 2020

Cannot use function macros with no arguments #450

Closed

hdamron17 added 2 commits May 23, 2020 20:01

Handle spaces in defines and whitespace in comments

f671f65

Handle lex::tests::test_no_newline (again)

36b7fc3

jyn514 reviewed May 24, 2020

View reviewed changes

src/lex/mod.rs Show resolved Hide resolved

src/lex/mod.rs Outdated Show resolved Hide resolved

hdamron17 added 5 commits May 24, 2020 00:03

Fix error messages for macros and #ifdef

675989b

Rework whitespace in tokens_until_newline

1d5578f

De Morgan

c1aa84a

cargo fmt

81d47d9

Add a few tests for preprocess only with exact matching

bbbb3ab

TODO add tests involving preprocessor stuff

jyn514 reviewed May 24, 2020

View reviewed changes

hdamron17 added 2 commits May 24, 2020 21:04

Make Whitespace matches consistently use ..

46b4943

Fixed issue with whitespace at beginning of analyze::test::lol in `pa…

78ad8fc

…rse_all`

Clean up filter_map thanks to @jyn514

8c4f568

Get tokens_until_newline to do what it's name suggests and other whit…

0e055b9

…espace improvements for preprocessor macros

jyn514 reviewed May 25, 2020

View reviewed changes

src/lex/cpp.rs Outdated Show resolved Hide resolved

jyn514 reviewed May 25, 2020

View reviewed changes

src/lex/cpp.rs Outdated Show resolved Hide resolved

src/lex/cpp.rs Outdated Show resolved Hide resolved

jyn514 linked an issue May 25, 2020 that may be closed by this pull request

[ICE] the lexer and the preprocessor have trouble getting along #394

Closed

hdamron17 added 3 commits May 24, 2020 23:35

More preprocess_only tests

3a9cd4a

Merge two versions of is_not_whitespace

6d084cd

Do not assume there is whitespace between define id and body

0d03378

This was referenced May 25, 2020

[ICE] unput doesn't play well with consume_whitespace #361

Open

Preprocessor stringify # #456

Merged

jyn514 merged commit ae5ac81 into jyn514:master May 25, 2020

hdamron17 deleted the preprocessor-whitespace branch May 25, 2020 22:04

jyn514 mentioned this pull request May 26, 2020

Space between function macro name and args #457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close #356 - Add whitespace token for preprocessor #437

Close #356 - Add whitespace token for preprocessor #437

hdamron17 commented May 15, 2020

jyn514 commented May 15, 2020

hdamron17 commented May 15, 2020

jyn514 commented May 20, 2020

hdamron17 commented May 23, 2020

hdamron17 commented May 23, 2020

jyn514 commented May 23, 2020

jyn514 May 24, 2020

hdamron17 May 24, 2020

jyn514 May 24, 2020

jyn514 May 24, 2020

hdamron17 May 24, 2020

jyn514 left a comment

hdamron17 commented May 24, 2020

hdamron17 commented May 24, 2020

jyn514 commented May 24, 2020

jyn514 May 24, 2020

hdamron17 May 25, 2020

hdamron17 May 25, 2020

jyn514 May 25, 2020 •

edited

Loading

jyn514 May 25, 2020

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

jyn514 commented May 25, 2020

jyn514 commented May 25, 2020

jyn514 commented May 25, 2020

hdamron17 commented May 25, 2020

jyn514 left a comment

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

Close #356 - Add whitespace token for preprocessor #437

Close #356 - Add whitespace token for preprocessor #437

Conversation

hdamron17 commented May 15, 2020

jyn514 commented May 15, 2020

hdamron17 commented May 15, 2020

jyn514 commented May 20, 2020

hdamron17 commented May 23, 2020

hdamron17 commented May 23, 2020

jyn514 commented May 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jyn514 left a comment

Choose a reason for hiding this comment

hdamron17 commented May 24, 2020

hdamron17 commented May 24, 2020

jyn514 commented May 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jyn514 May 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

jyn514 commented May 25, 2020

jyn514 commented May 25, 2020

jyn514 commented May 25, 2020

hdamron17 commented May 25, 2020

jyn514 left a comment

Choose a reason for hiding this comment

Code changes

Behavior

hdamron17 commented May 25, 2020

hdamron17 commented May 25, 2020

jyn514 May 25, 2020 •

edited

Loading