Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] parsing fixes, better error handling, fuzzing #35

Closed

Conversation

oberblastmeister
Copy link
Contributor

@oberblastmeister oberblastmeister commented Sep 28, 2021

Currently the parser doesn't parse operator associations correctly, everything is left associative. This is wrong for operators like ++, ->, and //. I fixed it for those operators. I also incorporated #34. But this error handling is more general and not just for semicolons. It uses the concept of recovery which is taken from rust-analyzer. I added fuzzing for the lexer and the parser which addresses #32. I have fixed the parser to pass all the fuzz tests. I removed cbitset which seems like an unnecessary dependency because we don't need all the different bitset variants, only the largest. It is easy just to write one by hand and it is very short. I have put the bitset into use everywhere to check if a token is contained inside rather than using a slice which has to be looped over. My code is very messy right now and I don't have tests, but I will add them as I go.

@oberblastmeister oberblastmeister changed the title [WIP] parsing fixes and better error handling [WIP] parsing fixes, better error handling, fuzzing Sep 28, 2021
@aaronjanse
Copy link
Member

Whoa, this is very cool. Thank you @oberblastmeister!! I'll start trying this out locally while developing rnix-lsp :-)

@Ma27 Ma27 requested review from Ma27 and aaronjanse October 8, 2021 11:38
@Ma27 Ma27 added this to the 0.10.0 milestone Oct 8, 2021
@Ma27 Ma27 added bug Something isn't working enhancement New feature or request labels Oct 8, 2021
name = "lexer"
path = "fuzz_targets/lexer.rs"
test = false
doc = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing final new line

writeln!(handle, "Fuzzing {:?}\n\n", data).unwrap();
let _ = rnix::parse(text);
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing final new line

@@ -411,6 +500,7 @@ where
return self.builder.checkpoint();
}
};
// println!("peek is: {:?}", peek);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind removing these debugging lines?

@Ma27
Copy link
Member

Ma27 commented Nov 12, 2021

So this is actually way too much to merge as-is IMHO.

I'm strongly in favor of applying the fuzzer-part, but I'm not sure about the rest currently.

@oberblastmeister
Copy link
Contributor Author

Well I think that our parser actually being able to parse stuff correctly is pretty important :].

@aaronjanse
Copy link
Member

Yep I agree, correct parsing is important. And improved error handling is necessary for providing autocomplete for rnix-lsp.

@oberblastmeister Would you mind splitting this into smaller PRs, such as maybe:

  • fuzzing
  • error handling
  • correct operation handling
  • dependency changes

@oberblastmeister
Copy link
Contributor Author

Yeah I can split them up

@aaronjanse
Copy link
Member

Thank you!!

@Ma27
Copy link
Member

Ma27 commented Nov 16, 2021

Well I think that our parser actually being able to parse stuff correctly is pretty important :].

I have to apologize actually, my wording was rather poorly! What I meant was exactly what @aaronjanse meant (we briefly talked about it previously), so thanks a lot for your work on this, but splitting things up would actually help us to get things in! :)

@mohe2015
Copy link

It uses the concept of recovery which is taken from rust-analyzer.

Does this mean the parser (I don't know the current state) will not be parsing in linear time then? Is the Nix Language not LL(k) parseable?

@Ma27
Copy link
Member

Ma27 commented Nov 30, 2021

Re-assigning to 0.11.0. @oberblastmeister do you think you'll be able to get to splitting things up during the next weeks? I'm not sure when I'll get to it, but otherwise I'd resume here :)

Does this mean the parser (I don't know the current state) will not be parsing in linear time then? Is the Nix Language not LL(k) parseable?

Perhaps it is, but it doesn't seem desirable. Further context is in https://edolstra.github.io/pubs/phd-thesis.pdf (Section 4.2, page 64):

Most context-free parsers (e.g., LL(k)
and LR(k)parsers, for fixed k) suffer from bounded look-ahead: they must choose
between different productions on the basis of a fixed number of tokens. This is
inconvenient for language implementors, since they must deal with the resulting
parse conflicts. For instance, the input fragment "{x" could be start of an attribute set
(e.g., {x=123;}), or a function definition (e.g., {x}: x)1. This requires the programmer
to introduce additional production rules to resolve the conflict

@oberblastmeister
Copy link
Contributor Author

This pr should not change the speed of the parser. It does not change what the parser does, only the error handling. The parser still does the same amount of lookaheads as before.

oppiliappan added a commit to oppiliappan/rnix-parser that referenced this pull request Feb 13, 2022
darichey pushed a commit to darichey/rnix-parser that referenced this pull request Jun 15, 2022
@oberblastmeister
Copy link
Contributor Author

I'm closing this since this pr has been split up. The only thing left to do is error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants