Separate parsing into a library #266

jyn514 · 2020-02-10T00:07:17Z

Suggested by @pythondude325. Related to #151.

It would be really cool to have a separate library which serializes and deserializes C source code. Most of the serialization is already done in src/data in various impl Displays. The hard part will be factoring apart the lexer/parser from the rest of the compiler. The syntax part of this is already hard as noted in #151, but separating the preprocessor would also be somewhat difficult: I'd have to duplicate a fair bit of code (lexing tokens mostly) and have a way to pass locations around.

Related facts:

The preprocessor is whitespace dependent
The preprocessor allows arbitrary characters in #include as long as they are valid UTF8 (including things that would normally be lexer errors)
The preprocessor allows arbitrary expressions in #if as long as they only contain integer constants
The preprocessor does not allow floating point constants.

My proposed plan is this:

Have a lexer/parser combo. Possibly rewrite these from scratch using logos and LALRPOP. This will do absolutely no semantic checking, only parsing. This will be the library.
Before the parser runs, have a preprocessor. Serialize tokens to strings before passing to the lexer (or possible don't have tokens at all? Is that feasible?). To allow #if, add an expr() API to the serde parser. To allow keeping track of multiple files add a metadata field to location:

#[derive(Copy)]
struct Location<T: Copy> {
    span: Span,
    metadata: T,
}

This allows people who don't care to leave it blank (()) and people who do care to pass a FileId or similar.

After the preprocessor runs, the compiler proceeds as normal: analysis -> constant folding -> codegen -> linking.

Open questions:

Should the preprocessor be part of the library? If so, how should we deal with #includes? (devsnek on #lang-dev recommended calling a user-defined function - that would need to be aware of local vs. global includes as well as search paths).
Should this a separate crate or the same rcc crate with codegen behind a feature flag?

The text was updated successfully, but these errors were encountered:

jyn514 · 2020-03-26T16:14:58Z

If #356 is implemented, the preprocessor could instead run between the lexer and the parser. This would clean up the current somewhat hacky way the preprocessor consumes the lexer's characters for it. It would also allow people to opt-in/out of the preprocessor.

jyn514 changed the title ~~Separate parsing into a serde library~~ Separate parsing into a library Feb 10, 2020

jyn514 mentioned this issue Mar 26, 2020

Remember whitespace for -E #356

Closed

jyn514 mentioned this issue Apr 12, 2020

Separate semantic analysis and parsing #370

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate parsing into a library #266

Separate parsing into a library #266

jyn514 commented Feb 10, 2020 •

edited

Loading

jyn514 commented Mar 26, 2020 •

edited

Loading

Separate parsing into a library #266

Separate parsing into a library #266

Comments

jyn514 commented Feb 10, 2020 • edited Loading

jyn514 commented Mar 26, 2020 • edited Loading

jyn514 commented Feb 10, 2020 •

edited

Loading

jyn514 commented Mar 26, 2020 •

edited

Loading