-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Relay] Add Parser 2.0 #5932
Conversation
A few thoughts: Another benefit of using a parser generator like ANTLR is that you have a specification of the language that serves as documentation and defines the parsing behavior, keeping the documentation always up to date. I see the value in error reporting integration and removing the external dependency, but it would be good to further motivate these changes and maybe find ways to further modularize version 2.0 to make it noob-friendly. |
@weberlo I think ANTLR only provides those benefits if you assume the people working on the project actually know ANTLR, which in so far as I can tell is not true. Josh and you were pretty much the only ones to work on the previous parser. Not to mention as we extend the parser to TIR and the rest of TVM it will become increasingly hard for anyone to make even small tweaks. The current parser was also incomplete and failed to handle many tricky cases which can often be solved with small amounts of constant lookahead tokens. Furthermore many of the grammar gymnastics required to parse in ANTLR are complex and easy for new users to break while this might still require some understanding the ordering is explicit in code for users to read and learn from. ANTLR is also a painful deployment dependency as we need Java, Python, and C++ to build the current parser. Furthermore the parser necessitated a rewrite given that it was in Python and needs to be in C++ or another statically linkable language. Finally error reporting the main reason to write it by hand, if you look at most production quality compilers they have hand written parsers mostly for error reporting and recovery reasons. Most generated parsers fail on the first invalid token, or parse issue such as an invalid identifier. The above parser can continue even after encountering a parse error enabling better error reporting. In my exp. compilers which use parser generators i.e OCaml or F* have a horrible user exp. when compared to languages with hand rolled parsers such as Rust or Lean. |
I just marked this as ready for review, my suggestion is that we review the existing code and land it in an experimental state. I will finish the metadata parsing and integration tests on real models in follow up work. My fear is that if I do it on this branch we are looking at 4kloc+ to review all at once. The PR is already pretty big and will take time to review. Thoughts? |
cc @antinucleon and @jwfromm |
I agree we should incrementally add the support of these language features to make review smoother. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, more doc would be helpful, especially in parser.cc talking about how the various classes fit together. Also the Lookahead fn in parser.cc seemed like it was a bit weird / broken -- I left some comments about it there
@jroesch : Thanks for the PR! Great work 👍 |
@ANSHUMAN87 I have been super busy and will post one soon. |
@ANSHUMAN87 here is some initial details https://discuss.tvm.ai/t/rfc-meta-rfc-3-pronged-plan-for-improving-error-messages-in-tvm/7214 |
Okay I addressed the vast majority of comments directly and hopefully got everything, CI is building if people can do another pass. |
/*! Conditionally consume a token when it matches, this will never trigger an error | ||
* as we guard against consuming the token before we do. | ||
* | ||
* Useful for matching optional tokens, effectively looksahead by one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Useful for matching optional tokens, effectively looksahead by one. | |
* Useful for matching optional tokens, effectively looks ahead by one. |
src/parser/parser.cc
Outdated
/*! \brief Convert a numeric token to an NDArray for embedding into the Relay program. */ | ||
NDArray NumberToNDArray(const Token& token) { | ||
if (token->token_type == TokenType::Integer) { | ||
DLContext ctx({.device_type = DLDeviceType::kDLCPU, .device_id = 0}); | ||
auto dtype = String2DLDataType("int32"); | ||
auto data = NDArray::Empty({}, dtype, ctx); | ||
auto array = reinterpret_cast<int32_t*>(data->data); | ||
// revisit this, literal node issue. | ||
int64_t value = Downcast<tvm::Integer>(token->data); | ||
array[0] = (int32_t)value; | ||
return data; | ||
} else if (token->token_type == TokenType::Float) { | ||
DLContext ctx({.device_type = DLDeviceType::kDLCPU, .device_id = 0}); | ||
auto dtype = String2DLDataType("float32"); | ||
auto data = NDArray::Empty({}, dtype, ctx); | ||
auto array = reinterpret_cast<float*>(data->data); | ||
// revisit this, literal node issue. | ||
float value = Downcast<tvm::FloatImm>(token->data)->value; | ||
array[0] = value; | ||
return data; | ||
} else { | ||
LOG(FATAL) << "internal error: should only call this function on numeric tokens"; | ||
return NDArray(); | ||
} | ||
} | ||
|
||
/*! \brief Convert a boolean value to an NDArray for embedding into the Relay program. */ | ||
NDArray BooleanToNDarray(bool value) { | ||
DLContext ctx({.device_type = DLDeviceType::kDLCPU, .device_id = 0}); | ||
auto dtype = String2DLDataType("bool"); | ||
auto data = NDArray::Empty({}, dtype, ctx); | ||
auto array = reinterpret_cast<bool*>(data->data); | ||
array[0] = value; | ||
return data; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to refactor this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the method above. seems like there's shared structure at a glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't really any easy way to refactor because you really templatize the code cleanly due to the need to pass dtypes around and perform the correct casting based on dtype and container type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. wasn't sure
src/parser/parser.cc
Outdated
SemVer ParseSemVer() { | ||
// TODO(@jroesch): convert semver to module level attribute. | ||
auto id = Peek(); | ||
if (id->token_type == TokenType::Identifier && id.ToString() == "v0") { | ||
auto id = Match(TokenType::Identifier); | ||
Consume(TokenType::Period); | ||
// CHECK_EQ(minor_and_patch) | ||
Consume(TokenType::Float); | ||
} | ||
// For now we only support current version. | ||
return SemVer{.major = 0, .minor = 0, .patch = 4}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if we only support the current version, we should still validate the given version matches that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are annoying issues with how this is done right now. I would like to move away from some ugly lexing hacks but in order to do that I need to change the semver. I would like to introduce module level attributes and instead provide general parsing for those instead of continue to hack this in. I will make sure this works before we purge the old parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh. so the todo above means we remove the semver from the text format? we can discuss whether or not to do so later, but yeah, until we do so, we should at least have a hack that checks for "v0.0.4", rather than the current half measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the problem is that that isn't a valid token, and trying to hack in is going to be a huge hack because its incredibly contextual and overlaps with a lots of other lexing rules. I don't really want to do it given that I WILL rip it out soon, and the old parser is still in place for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. if it's a big change, then we can leave it as is for now
src/parser/parser.cc
Outdated
case TokenType::Extern: { | ||
Consume(TokenType::Extern); | ||
// TODO(@jroesch): add some validation here? | ||
defs.types.push_back(ParseTypeDef()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if it can be validated, since it's opaque. unless you mean something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser is the only place we can reject this if it has non-zero fields. I will come back to this in the final clean up. Trying to land the initial infra and then can do a polish pass or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad. you meant to ensure there aren't any constructors. should be a two-line fix, right? just store the parsed def, then CHECK_EQ(def->constructors.length, 0)
, before pushing it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we need to do errors (which is the goal of my next PR, so I figured I would do all in one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
Co-authored-by: Logan Weber <36520469+weberlo@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @tqchen could you take another look?
This PR implements a pure C++ parser for Relay's text format and starts to lay ground work for improved error reporting and full program parsing work that I will send an RFC for sometime next week. The goal is to remove the external dependency on ANTLR and make it easier for non-parsing experts to make simple modifications or tweaks to the parser.
I have implemented nearly all the expression and definition parsing, I have some remaining work to do on parsing types and ensuring end to end examples are working. I am opening the PR now in draft form to solicit some initial feedback.
Features