-
Notifications
You must be signed in to change notification settings - Fork 26
Close #356 - Add whitespace token for preprocessor #437
Conversation
I'd be interested if it helped with #361 as well. |
Observation: Need to add whitespace consideration back to preprocessor (e.g. |
- Don't set `seen_line_token` for whitespace - Ignore whitespace in `#if` expressions (since the parser doesn't know what whitespace is) - Run `cargo fmt`
Note this will not fix #395. The |
and update tests to reflect non-whitespace
TODO should parse_all remove leading newlines?
It's working now, but I probably should put some whitespace-dependent test cases for One possible issue that came up is that the parser iteration does not work if there is leading whitespace. However, it works in the final product so I don't think it matters. I changed the test case to have no leading whitespace in 8e0509e. |
Yeah it doesn't keep whitespace properly, mostly with preprocessor stuff since I was just trying to get existing tests to pass... |
If I understand right, this means that it works correctly using |
src/lex/cpp.rs
Outdated
vec![x, Ok(Token::Whitespace(String::from(" ")))] | ||
} | ||
}) | ||
.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add these spaces here in the define? Does tokens_until_newline
not return whitespace tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens_until_newline
(at the moment) does not include whitespace tokens. I did not want to change it since it is somewhat separate from the preprocessor. I can try adding whitespace tokens to it and see what happens now that it is in a stable state. I did notice that clang
only puts one space when replacing preprocessor defines, regardless of the original spacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting ... I suppose since the behavior is correct we can try to go back and improve the spacing later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add a comment to this effect either here or at tokens_until_newline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually went back and did it the proper way by changing tokens_until_newline. Unfortunately I had to rework some stuff for boolean_expr because whitespace show up in the replacement stage. I'll push as soon as I double check the tests. Also, I think the rework will be much less of an eyesore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor nits, overall this looks great :) I would like to see tests for a/* */b
and maybe a few other things though.
TODO add tests involving preprocessor stuff
Some more tests with preprocessor stuff would be nice, but other than that, I think it's complete. |
Also, newlines are not preserved for preprocessor macros at the moment, so I'll need to fix that. |
Do you mean that #define f(a) { \
int b = a; \
return b; \
}
f(1)
{ int b = 1; return b; } If so, that's fine, clang does the same. It will also be hard to do without major changes since deleting |
fn preprocess_only() { | ||
assert_same_exact("int \t\n\r main() {}", "int \t\n\r main() {}"); | ||
assert_same_exact("int/* */main() {}", "int main() {}"); | ||
assert_same_exact("int/*\n\n\n*/main() {}", "int\n\n\nmain() {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this behavior looks a little confusing. Is there a reason you kept newlines inside of block comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the behavior of clang. For example,
int main() {
/*
*/
}
preprocesses to
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 363 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
int main() {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may need to look at the documentation though since
int main() {/*
*/}
mysteriously preprocesses to
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 363 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
int main() { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant part of the standard (5.1.1.2 Translation phases):
The source file is decomposed into preprocessing tokens7) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
This seems to me to say that clang's behavior in your first example is correct, and the behavior in the second is a bug. For context, gcc always behaves like clang in the second example, i.e. it always deletes newlines in comments:
$ gcc -x c -E -P -
int main() {
/*
*/
}
preprocesses to
int main() {
}
and
int main() {/*
*/}
preprocesses to int main() { }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tcc behaves the same as gcc.
Turns out it was an easy fix. I just changed the |
No, I mean
preprocesses to
but should preprocess to
|
The newlines not being preserved after some macros is because
preprocesses to
|
I don't see a way to fix this without fully separating the preprocessor from the lexer ... I guess I could make that function part of the lexer instead and stop at |
In any case I think it can be fixed later, the majority of things are working now. |
r=me once @hdamron17 is happy with it |
…espace improvements for preprocessor macros
I went ahead and changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing the changes to make sure I understand:
Code changes
- Add a whitespace token
- Fix up pretty printing, etc.
- Return whitespace tokens from
consume_whitespace
, etc. - Add
consume_whitespace_oneline
, which only consumes whitespace on the current line. Previously, the lexer would follow whitespace as long as it saw it, even if it went across many lines. This fixes [ICE] the lexer and the preprocessor have trouble getting along #394. consume_whitespace_oneline
works by returning an error if the whitespace had a newline. This is an error sinceconsume_whitespace_oneline
is only called for preprocessor directives, which must always be on the same line. This is really clever, I don't know if I would have thought of it :)- Change tests to filter out whitespace by default; add some tests that don't ignore whitespace and make sure that bit works properly
Behavior
- Keep all newlines within comments
- Replace comments with a single space
- Print whitespace when passed
-E
Let me know if I missed anything :)
r @jyn514 |
Your summary sounds about right. Also the behaviour of |
When complete, this will close #356 and maybe #395.
So far, the whitespace token has been added and properly outputs when using
-E
, but I still need to consume the whitespace before it's used by the parser.