Skip to content

[WIP] Move parser imports to another file#6625

Merged
dlang-bot merged 3 commits intodlang:masterfrom
RazvanN7:Parser_sep
Apr 4, 2017
Merged

[WIP] Move parser imports to another file#6625
dlang-bot merged 3 commits intodlang:masterfrom
RazvanN7:Parser_sep

Conversation

@RazvanN7
Copy link
Contributor

The parse.d file creates AST nodes which are passed to the semantic analysis. In the current form, the parser imports 95% of the total compiler code, including semantic analysis code which it doesn't use. In order to break the import cycle and to take a step further into transforming the compiler into a library, it is necessary to offer the possibility of separation between parsing and semantic analysis. This PR is a first step in that direction, by moving the creation of AST nodes (more explicitly the "new" expressions) to a different file and transforming the Parser class into a templated class which receives its AST nodes types through a template. This way anyone can define their own AST nodes and instantiate the parser with them.

This is a work in progress. I only moved the creation of Union and Struct declaration nodes to a file called codegenASTFamily which contains a struct that implements the creation methods.

When this PR will be finished, the CodegenASTFamily struct will contain the creation expressions for all AST nodes and parse.d will have a lot less imports.

@RazvanN7
Copy link
Contributor Author

CC @yebblies

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! A few course corrections within.

BTW thanks @WalterBright for the great idea with swapping entire AST families.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention would suggest codegen_ast_family

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope p = ... to avoid stuttering

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, but no need to tie types with the allocation method. Just this should suffice:

alias StructDeclaration = ddmd.dstruct.StructDeclaration;
alias UnionDeclaration = ddmd.dstruct.UnionDeclaration;
alias BaseClass = ddmd.dclass.BaseClass;
...

Then the rest of the code could continue using new ASTFamily.StructDeclaration; etc.

If there's a need, we may factor out the allocation and construction mechanisms as well in a later pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, no need to specify the type twice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(throughout)

src/ddmd/parse.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this import here. This is important - with the import it still means the parser depends on some specific AST family.

src/ddmd/parse.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a member - just use the static types. If methods are needed in some ASTFamily, they can be defined static.

In the future if we have an AST family keeping its own state (unlikely), we may go with a member.

src/ddmd/parse.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would become a = new ASTFamily.InterfaceDeclaration(loc, id, baseclasses);.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you updated this as I was reviewing. Taking this one step further: move all imports inside this struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These forwarding constructors quickly become a lot of boilerplate so we definitely should avoid them.

@UplinkCoder
Copy link
Member

This creates the illusion of independence.
In reality the interface for something to be considered an AST-Family is large and narrow, making it impractical to implement it differently from dmd's ast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we settled on non-top-level imports implicitly being public, or is this yet another remnant of the 313/314 days?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @RazvanN7 this is clever but let's create a unittest build with a separate AST family called e.g. ASTNull that has the ABSOLUTE MINIMAL types that make the parser build.

Then we need to parse a few files using Parser!ASTNull and place that in a unittest.

@dnadlinger
Copy link
Contributor

One thing I just want to make sure everybody is aware of is that LDC still needs a clean extern(C++) interface to the resulting AST. So far, there don't seem to be any issues in this regard (as the AST itself is switched out in one piece and still consists of C++ classes), but be aware of that if you were to templatize more parts to decouple things further in the future.

@andralex
Copy link
Member

This creates the illusion of independence.

It creates independence.

In reality the interface for something to be considered an AST-Family is large and narrow, making it impractical to implement it differently from dmd's ast.

The next step of the project is to create a family of simple AST nodes, devoid of any extra information and analysis. I predict that will be a rather easy undertaking.

@UplinkCoder
Copy link
Member

The next step of the project is to create a family of simple AST nodes, devoid of any extra information and analysis. I predict that will be a rather easy undertaking.

Fair enough.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WalterBright and I just had a pow-wow and agreed with the following:

  • acceptance of this hinges on the presence of the ASTNull family that defines minimal AST nodes. The idea being that we want to make sure we don't access fields/methods in AST types that we're not supposed to.
  • no underscores in file names :)

import ddmd.dversion;
import ddmd.errors;
import ddmd.expression;
import ddmd.func;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, the red here is a good indicator

src/ddmd/parse.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this stay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/codegen_ast_family/astcodegen/g

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/CodegenASTFamily/ASTCodegen/g

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @RazvanN7 this is clever but let's create a unittest build with a separate AST family called e.g. ASTNull that has the ABSOLUTE MINIMAL types that make the parser build.

Then we need to parse a few files using Parser!ASTNull and place that in a unittest.

src/ddmd/parse.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RazvanN7 don't forget to s/ASTFamily/AST/g at some point, thx!

@RazvanN7 RazvanN7 force-pushed the Parser_sep branch 6 times, most recently from edfe014 to eddf87e Compare March 22, 2017 11:09
@UplinkCoder
Copy link
Member

@RazvanN7 I am not too happy with the naming.
ASTCodegen suggests that this is part of cg.
what About DMDAST and NullAST

@RazvanN7
Copy link
Contributor Author

RazvanN7 commented Apr 3, 2017

@UplinkCoder Well, actually the it is part of the codegen since it holds useful information for the correct code generation. On the other hand, ASTNull is Walter's choice. I have nothing against the NullAST name

@UplinkCoder
Copy link
Member

UplinkCoder commented Apr 3, 2017

@RazvanN7 it is a dependency for cg, but does not do any cg on it's own.
hence it should not be in last part of the name.

The last part is always the more significat one in english.
A Baskelball is a ball.
and not a basket.
similarly the codegenAST is an AST.
not a code-generator.

@UplinkCoder
Copy link
Member

@RazvanN7 also astnull is missing in win32 makefile while it is in the posix one

@andralex
Copy link
Member

andralex commented Apr 3, 2017

@UplinkCoder @WalterBright wanted ASTCodegen which I'm also unhappy with but it's his prerogative.

@RazvanN7
Copy link
Contributor Author

RazvanN7 commented Apr 3, 2017

@UplinkCoder You are right, I updated now the win32.mak file

@RazvanN7 RazvanN7 force-pushed the Parser_sep branch 4 times, most recently from 7ae53f0 to 2c68802 Compare April 4, 2017 10:33
@dlang-bot dlang-bot merged commit 2159e82 into dlang:master Apr 4, 2017
@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Apr 6, 2017

@RazvanN7 the unit test in ddmd.parse segfaults on macOS:

0   dmd                                 0x000000010d713f54 _D4core7runtime18runModuleUnitTestsUZ19unittestSegvHandlerUNbNiiPS4core3sys5posix6signal9siginfo_tPvZv + 56
1   libsystem_platform.dylib            0x00007fffaf8acb3a _sigtramp + 26
2   dmd                                 0x000000010d97bd05 D2rt19sections_osx_x86_6412SectionGroup6__initZ + 742181
3   dmd                                 0x000000010d5bf30a D4ddmd5parse34__T6ParserTS4ddmd7astnull7ASTNullZ6Parser6__ctorMFC4ddmd7astnull7ASTNull6ModuleAxabZC4ddmd5parse34__T6ParserTS4ddmd7astnull7ASTNullZ6Parser + 90
4   dmd                                 0x000000010d5a94d8 D4ddmd5parse18__unittestL8426_10FZv + 56
5   dmd                                 0x000000010d49bf2d _D4ddmd5parse9__modtestFZv + 9
6   dmd                                 0x000000010d713fa1 D4core7runtime18runModuleUnitTestsUZ14__foreachbody2MFPS6object10ModuleInfoZi + 45
7   dmd                                 0x000000010d70c58f D6object10ModuleInfo7opApplyFMDFPS6object10ModuleInfoZiZ9__lambda2MFyPS6object10ModuleInfoZi + 35
8   dmd                                 0x000000010d72a0f2 D2rt5minfo17moduleinfos_applyFMDFyPS6object10ModuleInfoZiZ14__foreachbody2MFKS2rt19sections_osx_x86_6412SectionGroupZi + 86
9   dmd                                 0x000000010d72a07d D2rt5minfo17moduleinfos_applyFMDFyPS6object10ModuleInfoZiZi + 33
10  dmd                                 0x000000010d70c566 D6object10ModuleInfo7opApplyFMDFPS6object10ModuleInfoZiZi + 34
11  dmd                                 0x000000010d713e8b runModuleUnitTests + 127
12  dmd                                 0x000000010d724893 D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv + 23
13  dmd                                 0x000000010d724830 D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ7tryExecMFMDFZvZv + 36
14  dmd                                 0x000000010d724796 _d_run_main + 498
15  dmd                                 0x000000010d49bc4e main + 34
16  dmd                                 0x000000010d49a0d4 start + 52
Segmentation fault: 11

The DMD unit tests are automatically run when building DMD with DEBUG=1.

extern (C++) class TemplateParameter
{
final extern (D) this(A, B)(A a, B b) {}
void foo() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually a member of the real TemplateParameter class?

@yebblies
Copy link
Contributor

yebblies commented May 8, 2017

After reviewing this in more detail I've come to the conclusion that the ASTNull is not at all useful, and needs to be deleted.

acceptance of this hinges on the presence of the ASTNull family that defines minimal AST nodes. The idea being that we want to make sure we don't access fields/methods in AST types that we're not supposed to.

I do agree that such a family is useful, but the minimal AST is ASTBase, not this. For it to be a minimal AST it must by definite have some kind of use, which this does not. Making sure that no non-semantic fields are accessed is covered by ASTBase.

I suspect this may have come from a mis-communication of the idea of an AST factory interface that does not actually build an AST, which is the minimal interface required for some parser tasks (syntax verification) - but this is approach is still unacceptable for that as it will unavoidably allocate memory.

The minimal AST set that does involve a class hierarchy must be capable (at the very least) of representing the tree structure of parsed nodes. After all, the 'T' in AST stands for tree, and if you don't build a tree you have just allocated a bunch of useless classes. More realistically the minimal AST will contain all of the information necessary to run semantic analysis - ie everything that the parser currently generates.

@yebblies yebblies mentioned this pull request May 8, 2017
@andralex
Copy link
Member

andralex commented May 9, 2017

Is /dev/null/ unnecessary and should it be deleted? :)

@yebblies
Copy link
Contributor

yebblies commented May 9, 2017

If /dev/null/ wrote its contents to disk then yeah it would be hopelessly broken. This doesn't avoid building an ast, or build a minimally useful ast.

@yebblies
Copy link
Contributor

yebblies commented May 9, 2017

Compare the implementation of /dev/null vs this PR:

static ssize_t write_null(struct file *file, const char __user *buf,
			  size_t count, loff_t *ppos)
{
	return count;
}

Typically a null implementation doesn't take thousands of lines, allocate a bunch of memory, and add a maintenance burden in the form of massive duplication.

Please let me know if I've missed any uses of this that aren't covered better by the base ast or ast construction interface.

@jacob-carlborg
Copy link
Contributor

The AST that just sets the instance variables that was shown in the talk would be a nice start.

@andralex
Copy link
Member

andralex commented May 9, 2017

@yebblies "I want a parser that simply gives a yes/no on whether the code is syntactically correct, without any additional processing". Generally I'm fine with removing this if it has no use, but give it a little more time before we make that decision.

@yebblies
Copy link
Contributor

yebblies commented May 9, 2017

This this fails that use case because it allocates a huge amount of memory. This is an easily solved problem but allocating a broken ast is not the answer.

@andralex
Copy link
Member

andralex commented May 9, 2017

@yebblies OK. Let's give this a couple of months and then remove if not necessary. Thanks!

@yebblies
Copy link
Contributor

yebblies commented May 9, 2017

How about instead we remove it and if it turns out it's necessary we can add it back in. That we will fix it now instead of waiting two months to revert a pull you should never have merged in the first place.

Re-adding it would have very little cost, while maintaining it for two months would be a different story.

@RazvanN7
Copy link
Contributor Author

RazvanN7 commented May 9, 2017

I think that ASTNull serves as a good of example of what functions and variables you need to define to have a working parser. ASTBase has a lot of methods and fields which may be dropped if someone wants to implement its own AST family.

I do agree that there is no situation in which ASTNull can be used per se, but it's easier to read than the enriched ASTBase.

@yebblies
Copy link
Contributor

yebblies commented May 9, 2017

But why would anyone ever define an ast family from scratch instead of deriving from astbase? Astbase effectively is part of the api of the parser. Any fields in astbase should be required to get a valid ast out of the parser - other fields should not be in there.

I'm happy to talk about improving the readability but not at the cost of this much duplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments