-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303
Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303
Conversation
Just ran and pushed a commit that runs rustfmt over the repo. All of its formatting suggestions are now in one commit so it should be nice and easy to find any bad suggestions! |
Looks like Cannot implicitly convert type 'System.Collections.Generic.Dictionary<int, int>' to 'IceRpc.Tests.Slice.CustomDictionary<int, int>'
Cannot implicitly convert type 'IceRpc.Tests.Slice.CustomDictionary<int, int>' to 'System.Collections.Generic.Dictionary<int, int>'
Cannot implicitly convert type 'int[]' to 'IceRpc.Tests.Slice.CustomSequence<int>'
Cannot implicitly convert type 'IceRpc.Tests.Slice.CustomSequence<int>' to 'int[]'
Cannot convert from 'IceRpc.Tests.Slice.CustomSequence<int>' to 'System.ReadOnlyMemory<int>' This occurs in |
The new parser unintentionally fixed some bugs in the repo. I had a local patch that fixed them, and opened a PR for them here: icerpc/icerpc-csharp#1923 |
Co-authored-by: Reece Humphreys <reecewh@icloud.com>
Co-authored-by: Reece Humphreys <reecewh@icloud.com>
Co-authored-by: Reece Humphreys <reecewh@icloud.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of comments.
Co-authored-by: Joe George <joe@externl.com>
#303 (comment) > I would just fix them to expect a syntax error. Alright! Sounds good to me! Keep 'em coming though! The more comments the more chances of catching things : v) |
Everythings looking good. I'll approve once the format action passes 😉 |
The tests were changed to expect a syntax error. I included the location information even though I'm under the impression we don't actually check it. But I hope that's fine. |
I kept a majority of the changes that
It removes the space between
It shifts these comments way to the right to try and align them with the line above,
I don't think this syntax is correct. Either the function goes on one line, or each parameter gets it's own line. |
Yes this is a bug, we should open an issue on the rustfmt repo.
There is an option to configure this behavior for rustfmt. We can change that rule if we think this is bad code. |
I've narrowed down the problem to their macro-argument-parser, when it recursively sub-parses the match arms:
I'll open a bug in the morning. You guys should copyedit my words for professionalism. |
|
I opened a bug for the whitespace issue: rust-lang/rustfmt#5573 |
Yeah, I think this is pretty weird looking for Rust. What do you have in mind? |
I think the comment alignment might be buggy too, but that's less clear to me.
But here it insists they must be visually aligned: let i = Identifier {
value: "something".to_owned(), // Identifier
- span: span.clone(), // Cool span
+ span: span.clone(), // Cool span
}; The only reason I think it's a bug is because usually Diff in \\?\C:\Users\austin\Desktop\lalrpop-real\icerpc\src\parsers\slice\grammar.rs at line 388:
data_type,
tag,
is_streamed,
- is_returned: false, // Patched by its operation.
+ is_returned: false, // Patched by its operation.
parent: WeakPtr::create_uninitialized(), // Patched by its container. These comments are unrelated, so it feels weird to align them. They're only next to each other by coincedence here.
|
I think this isnt a big deal for this PR. We can just add the newline and everything is fine. self.advance_buffer(); // Consume the '&' character.
// Ensure the next character is also an '&' (since the whole token should be "&&").
if matches!(self.buffer.peek(), Some('&')) {
... The above works and rustfmt is fine with it. For issues like the one above lets just make those minor changes (in this case a line) and open an issue with rustfmt.
Lets let rustfmt does its thing for now and we can open an issue once this PR is merged to change the setting as I am guessing depending on what decision we come to it could change a lot of code. |
Seems the visual alignment is a known issue: rust-lang/rustfmt#4108 |
I am of the opinion that we shouldn't intentionally mis-format our code.
I'm not convinced the setting exists. |
@externl, @ReeceHumphreys is this okay to merge? I understand that If you both feel that strongly about this, I'm fine with |
Sure, this time we can run the formatter in a separate commit. |
This is a companion PR for icerpc/slicec#303
Issues Fixed
Fixes #277
Pest eagerly consumed whitespace after a doc comment giving bogus spans.
LALRPOP ignores the whitespace and only include the comment's text in it's spans.
Fixes #233
The new parser no longer has a grammar rule for block doc comments.
All places where we used/tested them have been removed.
Implements & Closes #131
Attributes can be placed on both members and their types:
op([cs::identifier("foo")] myParam: [cs::generic("bar")] sequence<MyType>);
Type attributes must be placed on the type, and entity attributes must be placed on the identifier.
IMPORTANT: This PR doesn't add validation/testing for this. We should add validation/testing for this.
Closes #53
This issue was Pest-specific and is now pointless.
Explanation
This PR switches the parsing library we use from Pest to LALRPOP.
We still use Pest specifically for parsing doc comments, but this will be removed in the future. One step at a time!
This adds 2 'folder' modules: one for preprocessing and one for slice. Each folder contains:
grammar.lalrpop
: Defines the grammar rules used by LALRPOP.grammar.rs
: Pulls in the generated LALRPOP code and defines helper functions.lexer.rs
: Converts a String of text into a stream (iterator) of tokens. This stream is used by the parser.parser.rs
: Parses a token stream into somethingtokens.rs
: Defines/lists the tokens/errors that the lexer and parser can return/work with.Control Flow:
String
.The lexer is just an iterator, so calling 'lexer.next()' returns the next token/sourceblock.
Preprocessor
and callparse_slice_file
.This evaluates the preprocessor directives and returns an iterator of source blocks.
Any blocks that are conditionally compiled out are not in this iterator.
Parser
and callparse_slice_file
.This parses the slice and returns a
SliceFile
.This is the source code that does this:
https://github.com/InsertCreativityHere/icerpc/blob/9b188ce56567a10d15eacbb6f13b608db2a329aa/src/parser/slice.rs#L38-L46
Other Changes
new
andadd_x
methods from grammar elements. Also no longer necessary.raw_value
field fromIdentifier
. Literally wasn't used anywhere.