Skip to content
This repository has been archived by the owner on Jun 3, 2021. It is now read-only.

Support preprocessing #5

Open
10 of 16 tasks
jyn514 opened this issue Jun 20, 2019 · 12 comments
Open
10 of 16 tasks

Support preprocessing #5

jyn514 opened this issue Jun 20, 2019 · 12 comments
Labels
enhancement New feature or request preprocessor Issue in the preprocessor (probably cycle detection)
Milestone

Comments

@jyn514
Copy link
Owner

jyn514 commented Jun 20, 2019

Section 6.10:

  • 6.10.1: #if / #ifdef / #ifndef / #elif / #else / defined conditional compilation (added in First pass at preprocessor #184)
  • 6.10.2: #include headers
  • 6.10.3: #define macros and substitutions
    • object-like macros
    • 6.10.3.1 function-like macros
      • including __VA_ARGS__
    • 6.10.3.2 and # and ## operators
    • 6.10.3.5 and undef
  • 6.10.4: #line control (can be ignored for now, waiting on Emit debug info #152, Allow setting custom line numbers brendanzab/codespan#157)
  • 6.10.5: #error directives
  • 6.10.6, 6.10.9: pragma directives (can be ignored altogether)
    • #pragma
    • _Pragma ()
  • 6.10.7: # on its own (ignored)
  • 6.10.8: predefined macros
    • conditional features (6.10.8.3):
      • __STDC_NO_ATOMICS__
      • __STDC_NO_COMPLEX__
      • __STDC_NO_THREADS__
      • __STDC_NO_VLA__
@jyn514 jyn514 added lexer Issue dealing with parsing the lexical tokens of a program and removed lexer Issue dealing with parsing the lexical tokens of a program labels Jun 20, 2019
@jyn514
Copy link
Owner Author

jyn514 commented Jun 20, 2019

This probably deserves its own pass.

@jyn514 jyn514 added enhancement New feature or request MVP and removed MVP labels Aug 22, 2019
@jyn514 jyn514 added this to the MVP milestone Aug 28, 2019
@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

Ok I'm actually thinking about this seriously now. The first step is to figure out how to preserve locations for error messages. I think I can start with fn preprocess(&str) -> Vec<Locatable<Token>> and go from there. It would be nice to also support a single String as output but that shouldn't be too terribly hard to reconstruct from the tokens later.

@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

There's lots of things that are shared between the lexer and preprocessor, it's a weird mix. I'd like to reuse the code but I'd also like to be able to preprocess things on their own. I guess I don't actually need to make them separate to reuse the functions as long as I can go from Vec<Token> to String though

@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

Working on this in https://github.com/jyn514/rcc/tree/cpp

@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

Ok how about this - the preprocessor runs after the lexer and the lexer just collects the tokens and puts them an a mini AST. That leaves a clean break between syntax and semantics.

The only issues I see are #line directives (can hack around this but I'd have to change every location, it'd be ugly) and I'd have to rework the lexer to return an enum { Token(...), CppToken(...) }.

@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

I could special case line in the lexer, that shouldn't be too hacky. Something like this:

if let CppToken::Line(n) = token {
   self.location.line = n;
}

and then everything else still gets done in the preprocessor. I like that idea, I think I'll do it.

@jyn514
Copy link
Owner Author

jyn514 commented Dec 13, 2019

Actually this can't go after the lexer because it's affected by whitespace :( These have different meanings:

#define f(a) a
f(a) // emits a
#define f (a) a
f(a) // emits (a) a

@jyn514 jyn514 pinned this issue Dec 17, 2019
@jyn514
Copy link
Owner Author

jyn514 commented Dec 22, 2019

Crazy idea: substitute self.lexer with the contents of #if or #include sections, which allows doing basically everything in place without changing existing code.

@jyn514
Copy link
Owner Author

jyn514 commented Dec 22, 2019

That works well for #includes, but I don't think it's a good idea for #if directives. I think a state machine would be a good idea instead so it can keep being an external iterator instead of an internal one.

@jyn514
Copy link
Owner Author

jyn514 commented Jan 4, 2020

Update on #if:

It has to support arbitrary C expressions in an #if directive, as well as preprocessing those tokens. I could make a new Parser and call parser.expr(), but there could be #defined ids inside the expression, so it also has to be preprocessed first. I think I could preprocess all the tokens between #if and the end of the line, change all the Id(...) tokens to Literal(Int(0)), and then pass that into a new parser instance.

@jyn514
Copy link
Owner Author

jyn514 commented Jan 16, 2020

This gets even worse: not all valid C expressions are valid preprocessor expressions. For examples:

$ run_clang 
#if (int)1
#endif
<stdin>:1:10: error: token is not a valid binary operator in a preprocessor subexpression
$ run_clang
#if 1 = 1
#endif
<stdin>:2:7: error: token is not a valid binary operator in a preprocessor subexpression
$ run_clang
run_clang 
#if 1.31 + 1 
#endif
<stdin>:1:5: error: floating point literal in preprocessor expression

@jyn514
Copy link
Owner Author

jyn514 commented Jan 24, 2020

@jyn514 jyn514 unpinned this issue Mar 9, 2020
@jyn514 jyn514 added the preprocessor Issue in the preprocessor (probably cycle detection) label Mar 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request preprocessor Issue in the preprocessor (probably cycle detection)
Projects
None yet
Development

No branches or pull requests

1 participant