Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser] Parser 2.0 part 2 #6162

Merged
merged 48 commits into from
Aug 11, 2020
Merged

[Parser] Parser 2.0 part 2 #6162

merged 48 commits into from
Aug 11, 2020

Conversation

jroesch
Copy link
Member

@jroesch jroesch commented Jul 28, 2020

This PR implements both metadata parsing and converts everything to the new diagnostic machinery.

I initially believed that the right way to parse the metadata table is to generate delayed references to the metadata table and then expand them after parsing is complete. In order to do this we need a piece of syntax i.e Expr, Type, Attrs, and so on for each meta reference. Currently this only works for Expr as I tried to define MetaRef as op right now in order to minimize changes.

This brings up a bigger long term question of how much we should rely on the meta table outside of things such as the constants or other "unprintable" values. I realize with my tokenization approach we could eagerly expand the meta references but this requires that we have the full program on hand in order to successfully parse any program fragment.

Would love to get some more thoughts and feedback cc @MarisaKirisame @joshpoll @tqchen @zhiics @electriclilies @mbrookhart @junrushao1994.

FYI cc @tmoreau89

src/parser/meta_ref.cc Outdated Show resolved Hide resolved
src/ir/span.cc Outdated Show resolved Hide resolved
src/parser/meta_ref.cc Outdated Show resolved Hide resolved
tests/python/relay/test_ir_parser2.py Show resolved Hide resolved
Copy link
Contributor

@ANSHUMAN87 ANSHUMAN87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jroesch for the PR.
As i missed your part-1 PR, i think i have lot to catch up :)
Please find below some high level comments, hope it helps!

src/parser/meta_ref.h Outdated Show resolved Hide resolved
tests/python/relay/test_ir_parser.py Show resolved Hide resolved
@@ -85,16 +85,26 @@ class SpanNode : public Object {
int line;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what you are trying to achieve here is the range for line[begin, end], column[begin, end].
nit: May be we can rename the variables to reflect the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I agree with @ANSHUMAN87. Maybe its better to use "begin_line", "begin_column", "end_line", "end_column"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be begin_line, end_line, begin_column, end_column, as both values are related. Maybe even make a small range struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of painful to refactor due some of the Python code. I still need to update the Python bindings how about I add to tracking issue and we can do in follow up work?

include/tvm/parser/source_map.h Outdated Show resolved Hide resolved
include/tvm/parser/source_map.h Show resolved Hide resolved
python/tvm/parser/__init__.py Outdated Show resolved Hide resolved
src/parser/diagnostic.h Outdated Show resolved Hide resolved
include/tvm/relay/adt.h Show resolved Hide resolved
src/ir/span.cc Show resolved Hide resolved
src/parser/diagnostic.h Show resolved Hide resolved
src/parser/meta_ref.cc Show resolved Hide resolved
return true;
}

RELAY_REGISTER_OP("parser.MetaRef")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As "parser.MetaRef" carry special meaning in the context, i think we can define this key at one place, and reuse it in all places. This helps if we want to change it at later point in time.

src/parser/meta_ref.cc Show resolved Hide resolved
*/
Expr MetaRef(std::string type_key, uint64_t node_index);

relay::Function ExpandMetaRefs(const MetaTable& meta_table, const relay::Function& mod);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod -> func ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I will do this.

src/parser/parser.cc Outdated Show resolved Hide resolved
src/parser/parser.cc Outdated Show resolved Hide resolved
@jroesch jroesch marked this pull request as ready for review July 31, 2020 05:10
@jroesch
Copy link
Member Author

jroesch commented Jul 31, 2020

This PR is now ready for full review, if possible can people please chime in about meta-table and related decisions. I know there are still some incomplete aspects but this is already more robust then the previous parser and I would like to incrementalize instead of sending many 4kloc PRs.

@@ -85,16 +85,26 @@ class SpanNode : public Object {
int line;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I agree with @ANSHUMAN87. Maybe its better to use "begin_line", "begin_column", "end_line", "end_column"?

* Could represent the source from an ML framework or the internal
* source of a TVM program.
*/
struct Source {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is not like POD aggregation, I personally prefer to use class instead

Suggested change
struct Source {
class Source {
public:

*/
TVM_DLL Match(Expr data, tvm::Array<Clause> clauses, bool complete = true);
TVM_DLL Match(Expr data, tvm::Array<Clause> clauses, bool complete = true, Span span = Span());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should think a bit about Span(nullptr) vs Optional<Span>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't want spans to be optional if possible, but currently we need to migrate to setting them everywhere.

include/tvm/parser/source_map.h Show resolved Hide resolved
/*! \brief Construct a source from a string. */
TVM_DLL explicit Source(const std::string& source);

TVM_DLL Source(const Source& source) : source(source.source), line_map(source.line_map) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just use = default?

class SourceMapNode : public Object {
public:
/*! \brief The source mapping. */
Map<SourceName, tvm::String> source_map;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not much difference tho

Suggested change
Map<SourceName, tvm::String> source_map;
Map<SourceName, tvm::runtime::String> source_map;

out << std::endl;
}
};

/*! \brief The diagnostic level, controls the printing of the message. */
enum DiagnosticLevel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use enum class if possible

src/parser/diagnostic.h Show resolved Hide resolved
#include "./token.h"

namespace tvm {
namespace parser {

using namespace runtime;

// trim from start (in place)
static inline void ltrim(std::string& s) { // NOLINT(*)
s.erase(s.begin(), std::find_if(s.begin(), s.end(), [](int ch) { return !std::isspace(ch); }));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer to use IsWhitespace we defined below?

src/parser/token.h Outdated Show resolved Hide resolved
Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general thoughts after a first pass through the PR, I'll do a more detailed review tonight or tomorrow:

  1. Yay for getting rid of Antlr! This looks much easier to maintain.
  2. You've added diagnostic information via the Span object, and attempted to propagate it through the ExprMutator and ExprVisitor classes. There are a lot of passes that extend those classes without keeping span information, or directly use ExprFunctopr, as soon as we hit one of those passes, the diagnostic information will disappear, correct? Do we need to enforce valid spans at compile time to catch all of the passes?
  3. Since the diagnostic machinery is all based on source, I guess we need a pass to convert python-constructed graphs into relay source code for diagnostic information? Does this already exist somewhere in the compilation flow?

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. only a few nitpicks

src/parser/meta_ref.cc Outdated Show resolved Hide resolved
src/printer/text_printer.h Outdated Show resolved Hide resolved
src/parser/parser.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks. I'm happy to leave the completion of the diagnostic system discussed in my last comment for another PR, let's make sure those issues make it into the tracking issue.

python/tvm/error.py Show resolved Hide resolved
src/parser/parser.cc Outdated Show resolved Hide resolved
src/parser/parser.cc Show resolved Hide resolved
src/parser/token.h Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Aug 10, 2020

@jroesch please followup to address the CI error

@jroesch
Copy link
Member Author

jroesch commented Aug 11, 2020

@tqchen this should be gtg, looks like I got passed the last few CPU tests.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jroesch jroesch merged commit fa2213f into apache:master Aug 11, 2020
wjliu1998 pushed a commit to wjliu1998/incubator-tvm that referenced this pull request Aug 13, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* Add code from livestream with JK

* Fix errors parsing ResNet

* Parse metadata section efficiently and do most of plumbing to resolve metadata section references.

* WIP

* Change meta reference to an operator

* Meta references now work

* MetaReference expansion now works

* Start working on source map and move diagnostic context

* Convert tokenizer and parser to use new machinery

* Kill to_json

* Fix comment in type_infer.cc

* Remove old parser

* Rename parser tests and remove old ones

* Record span end information

* Convert to using spans everywhere

* Add span fields back to all Relay constructors

* Start passing spans

* Pass spans around visitors

* Format

* Fix

* Fix

* disable reference lint from string helpers

* Fix tokenizer

* Fix issue with empty metadata section

* Document new span fields and small tweaks

* Formatting

* Add span doc fields

* Add format tweak

* Improve errors and fix the semantic version tags in Prelude

* Update gradient.rly

* Clean up broken spans

* Clean up parser tests and turn on previously skipped tests

* Update errors to handle skipped cases

* Tweak

* Tweak

* Format

* Fix some minor issues with ADT tests

* Format

* Fix path

* WIP

* WIP

* Fix ir_text_printer

* format

* Formatted

* More formatting

* Repair test cases

* Fix CI

* Retrigger CI
@jroesch jroesch deleted the parser-2.0-pt2 branch February 4, 2021 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants