Moved getAttributes from attrib.d to expressionsem.d by AsterMiha · Pull Request #11788 · dlang/dmd

AsterMiha · 2020-09-23T11:01:25Z

Taking semantic elements out of attrib.d

dlang-bot · 2020-09-23T11:01:27Z

Thanks for your pull request and interest in making D better, @AsterMiha! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11788"

ibuclaw

GDC needs access to getAttributes to implement @attribute. I guess LDC will break as a result of this too for other attributes (such as @fastmath). Am I right @kinke?

thewilsonator · 2020-09-23T11:06:19Z

This still needs to be accessible from C++ for LDC (and I presume also GDC).

thewilsonator · 2020-09-23T11:52:42Z

src/dmd/expressionsem.d

    return true;
 }

+extern (C++) Expressions* getAttributes(UserAttributeDeclaration uad)


it also needs to be in the headers somewhere too.

src/dmd/attrib.h

RazvanN7 · 2020-09-23T12:28:53Z

src/dmd/attrib.h

It seems that the style is to have the * (star) right before the name of the function.

Yes, and the param is a pointer too.

Please add a test to cxx-unittest so that it is covered.

@ibuclaw do you mean src/tests/cxxfrontend.c or should I add it to test/unit?

@ibuclaw do you mean src/tests/cxxfrontend.c or should I add it to test/unit?

cxxfrontend.c - just need to call the function from c++ to verify that mangling matches.

RazvanN7 · 2020-09-23T12:29:25Z

src/dmd/frontend.h

    Dsymbol* syntaxCopy(Dsymbol* s);
    Scope* newScope(Scope* sc);
    void setScope(Scope* sc);
-    Array<Expression*>* getAttributes();


You probably need to add it too here.

RazvanN7 · 2020-09-23T12:30:08Z

src/dmd/objc.d

Because of UFCS[1] you leave this line unmodified.

[1] https://tour.dlang.org/tour/en/gems/uniform-function-call-syntax-ufcs

RazvanN7 · 2020-09-23T12:30:56Z

src/dmd/traits.d

Again, this line can be left unmodified.

kinke · 2020-09-23T12:34:24Z

Before diving into all these little details, how about questioning the point of this little refactoring? Moving getAttributes() from 1.5k-lines attrib.d to 12+k-lines expressionsem.d, at the cost of having to add a uad in almost every line (and sometimes multiple occurrences per line), just because the first call performs a semantic analysis?

RazvanN7 · 2020-09-23T12:39:01Z

Splitting semantic analysis from the ast nodes will ultimately lead to getting rid of the duplications in ASTBase.

The fact that expressionsem.d is huge is a sign that we should split it in multiple files, not that we should stop pulling out semantic routines from AST nodes.

kinke · 2020-09-23T12:45:17Z

So the end goal is to move every little function invoking sema to dedicated sema files? A quick grep shows still loads of occurrences in something like 40 files...

RazvanN7 · 2020-09-23T12:47:10Z

Out of the AST nodes, yes.

Geod24 · 2020-09-23T14:05:10Z

How will that make DMD usable as a library ?

RazvanN7 · 2020-09-23T14:13:44Z

@Geod24 The initial phase is a clean-up one so that Mihaela can accommodate with the codebase and then we will tackle the interface.

I think that having the separation helps making the code base cleaner anyway.

Geod24 · 2020-09-24T02:01:23Z

@Geod24 The initial phase is a clean-up one so that Mihaela can accommodate with the codebase and then we will tackle the interface.

I think that having the separation helps making the code base cleaner anyway.

"Clean code" can be a very subjective criteria.
In my experience, if you are designing a library, you either have a lot of experience in the problem domain, or have use case to drive you. Since Mihaela doesn't have the former, I was expecting to see the latter: a couple of use cases being gathered and worked on. But so far I've only heard of questionable cleanups, no use case. Designing a library for the sake of designing a library will almost always miss the point without use cases.

I see the the milestones come from dlang/project-ideas#68 . I highly suggest you rethink it around use cases. One metric should be: "If we stop the project at step X, how usable will the result be?". We already have good experience with ASTBase: it's been around for 4 (5?) years and, despite leading to constant pain (and being broken for a while), it hasn't brought any tangible benefit. That's very low on the usable scale. On the other hand, some people with use cases (@rainers @jacob-carlborg ) have pushed for changes which are quantifiable enhancements, such as the diagnostic reporter.

There's plenty of use cases available out there. One such is VisualD. There are a few points which are very clearly needed in all projects, require a lot of work, and will bring net benefits, such as adding start and end locations, or making the AST non-destructive. The former is an easy problem, the latter is a hard one, but progress in any of those means progress that won't be lost if the project is not completed and someone else has to pick it up next year.

WalterBright · 2020-09-24T03:49:25Z

Designing a library for the sake of designing a library will almost always miss the point without use cases.

Having designed many libraries myself, I can say "almost always" is actually "certainly".

RazvanN7 · 2020-09-24T03:53:30Z

@Geod24 Thanks for the insightful comment. I think that starting with adding start and end locations for AST nodes could be a good start.

jacob-carlborg · 2020-09-24T08:04:48Z

@RazvanN7 Here's a list of problems I've encountered when trying to use DMD as a library (these are actual problems I ran into trying to use the library to write a useful tool) :

There's a lot of global state in the compiler. This might need to be reset when using the compiler in an IDE/editor LSP server
There's no end location for tokens or AST nodes. This will make source code refactorings difficult/impossible
Some AST nodes don't have location information at all
Location information does not contain a buffer position/offset
It's not possible to analyze source code that is only in memory. Imports and string imports will be looked up on disk
The error handling/reporting is difficult to customize
Information is lost between the various compilation phases
The semantic analyzer is destructive

I think that starting with adding start and end locations for AST nodes could be a good start.

I think it would be even simpler to start with the tokens. Here's a list of related tasks (in order of difficulty, with the easiest one first) that will improve the possibility for source code refactorings (this is needed for some LSP features and other tools). I think this list is a good start, because most work will confined to the lexer and parser:

Add buffer position/offset to all locations. For the tests, I think it should be enough to only verify that the tokens has the correct buffer position. For the AST nodes, the parse will get the locations from the tokens. I think this is a one-line change in the lexer. Most work for this task would be to write tests.
Add start locations to all AST nodes which are missing start locations. I think this is fairly easy since most AST nodes already have a start location. IIRC, UDA nodes do not have a start location. For example:
```
@("foo")
void bar();
```
The first location in the line containing the UDA, will point to the start position of the string, not the @ symbol. The difficult part would be to find the nodes without a start location.
Add token length. This would be how many bytes the token occupy in the source code. I think it's enough to know the length of a token rather than the end location. I think this task is more straightforward than the above task, but it might take longer. Again, most work here is probably writing the tests.
Analyzing source code only from memory. I think this should be quite easy as well
Add end offset to all locations. This will require quite a bit of work for writing tests, since there are many different kinds of AST nodes
A way to get the related tokens an AST node consists of. I'm not sure if there need to be any changes to the compiler for this, or if it's enough to just re-lex the source code range the AST node occupy. For this the buffer position and end location is necessary.

I think the rest of the above mentioned problems are more difficult to tackle.

When it comes to testing of these tasks, I think best way is to write unit test style tests, like those available here [1]. There's no point in running the full compiler, including semantic analysis and code generation to only test the lexer or parser. There are already directories for the lexer and parser.

[1] https://github.com/dlang/dmd/tree/master/test/unit

jacob-carlborg · 2020-09-24T08:08:19Z

@Geod24 Regarding dlang/project-ideas#68. I think the last two tasks:

Develop an interface that is capable of overwriting any part of the compiler
Integrate dmd as a lib in existing tools

Are useful and have use cases. I have a use case for the first one in the above list. But I don't think they are realistic to achieve in the available time. I don't know if the other tasks are prerequisite for the above two tasks.

ibuclaw · 2020-09-24T19:56:18Z

How will that make DMD usable as a library ?

This question still makes me wince. If gdc and ldc haven't been using dmd as a library for all these years now, how are we using it?

AsterMiha · 2020-09-30T10:57:43Z

@jacob-carlborg Thank you for the detailed suggestions! I will start following this plan from now on.

* Add buffer position/offset to all locations.

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found? I would assume it's the second case since the first one sounds more like the start position.

jacob-carlborg · 2020-09-30T12:13:59Z

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found?

@AsterMiha In DMD there's a type called Loc. It's used to store a location of an AST node or a token. Currently this type contains the filename, the line number and the column number. The goal would be to add the byte offset to this type. That means, the number of bytes from the start of the source file to the start of the first byte of the location.

This is required to be able to extract the exact code range from the source file a given AST node or token occupies. This is needed to be able to do the source code refactoring.

Here's an example of a source file:

foo "bar"

The above source file consists of two tokens. First an identifier, then a string literal. The byte offset of the second token, the string literal, will be 4 (first byte starts at offset 0).

first one sounds more like the start position

Yes, it is the start position. DMD already has the start position, but in other units (line number and column number). We need it in number of bytes.

I added an explicit item for end locations in my previous post: #11788 (comment).

benjones · 2020-09-30T17:51:11Z

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found?

@AsterMiha In DMD there's a type called Loc. It's used to store a location of an AST node or a token. Currently this type contains the filename, the line number and the column number. The goal would be to add the byte offset to this type. That means, the number of bytes from the start of the source file to the start of the first byte of the location.

This is required to be able to extract the exact code range from the source file a given AST node or token occupies. This is needed to be able to do the source code refactoring.

Here's an example of a source file:
foo "bar"
The above source file consists of two tokens. First an identifier, then a string literal. The byte offset of the second token, the string literal, will be 4 (first byte starts at offset 0).

first one sounds more like the start position

Yes, it is the start position. DMD already has the start position, but in other units (line number and column number). We need it in number of bytes.

I added an explicit item for end locations in my previous post: #11788 (comment).

I suspect you'll get push back about adding this to Loc since it's used so frequently that increasing its size could have noticeable memory impact. I'm not saying don't do it, but be aware.

MoonlightSentinel · 2020-09-30T20:15:56Z

I suspect you'll get push back about adding this to Loc since it's used so frequently that increasing its size could have noticeable memory impact. I'm not saying don't do it, but be aware.

That's a valid concern which can be resolved by hiding the end location behind an appropriate version (as done e.g. for the callback API)

RazvanN7 · 2020-10-26T10:39:07Z

I will close this since the focus has changed to a different strategy.

AsterMiha requested review from Geod24, ibuclaw, jacob-carlborg and kinke as code owners September 23, 2020 11:01

ibuclaw requested changes Sep 23, 2020

View reviewed changes

AsterMiha force-pushed the attrib_sem branch from a65891a to d8e92a8 Compare September 23, 2020 11:24

thewilsonator reviewed Sep 23, 2020

View reviewed changes

AsterMiha force-pushed the attrib_sem branch from d8e92a8 to 509f61a Compare September 23, 2020 12:22

RazvanN7 reviewed Sep 23, 2020

View reviewed changes

src/dmd/attrib.h Outdated Show resolved Hide resolved

AsterMiha force-pushed the attrib_sem branch from 509f61a to 213a609 Compare September 23, 2020 12:27

RazvanN7 reviewed Sep 23, 2020

View reviewed changes

src/dmd/traits.d Outdated

Copy link

Contributor

RazvanN7 Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this line can be left unmodified.

Moved getAttributes from UserAttributeDeclaration to expressionsem.d

68980f4

AsterMiha force-pushed the attrib_sem branch from 213a609 to 68980f4 Compare September 23, 2020 13:29

Added getAttributes test

59f8f79

AsterMiha mentioned this pull request Oct 5, 2020

Added offset to the token location #11823

Merged

RazvanN7 closed this Oct 26, 2020

jacob-carlborg mentioned this pull request Oct 29, 2020

Added location to DVCondition #11907

Merged

RazvanN7 mentioned this pull request May 27, 2021

Added location to user defined attributes #12524

Closed

Geod24 mentioned this pull request Jul 2, 2021

Refactor duplicated code from ASTBase node AliasThis into a mixin template #12796

Closed

Uh oh!

Conversation

AsterMiha commented Sep 23, 2020

Uh oh!

dlang-bot commented Sep 23, 2020

Bugzilla references

Testing this PR locally

Uh oh!

ibuclaw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thewilsonator commented Sep 23, 2020

Uh oh!

thewilsonator Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RazvanN7 Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kinke Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

ibuclaw Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

AsterMiha Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

ibuclaw Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

RazvanN7 Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RazvanN7 Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RazvanN7 Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

kinke commented Sep 23, 2020

Uh oh!

RazvanN7 commented Sep 23, 2020

Uh oh!

kinke commented Sep 23, 2020

Uh oh!

RazvanN7 commented Sep 23, 2020

Uh oh!

Geod24 commented Sep 23, 2020

Uh oh!

RazvanN7 commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Geod24 commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WalterBright commented Sep 24, 2020

Uh oh!

RazvanN7 commented Sep 24, 2020

Uh oh!

jacob-carlborg commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacob-carlborg commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ibuclaw commented Sep 24, 2020

Uh oh!

AsterMiha commented Sep 30, 2020

Uh oh!

jacob-carlborg commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjones commented Sep 30, 2020

Uh oh!

ibuclaw left a comment •

edited

Loading

RazvanN7 Sep 23, 2020 •

edited

Loading

RazvanN7 Sep 23, 2020 •

edited

Loading

RazvanN7 Sep 23, 2020 •

edited

Loading

RazvanN7 commented Sep 23, 2020 •

edited

Loading

Geod24 commented Sep 24, 2020 •

edited

Loading

jacob-carlborg commented Sep 24, 2020 •

edited

Loading

jacob-carlborg commented Sep 24, 2020 •

edited

Loading

jacob-carlborg commented Sep 30, 2020 •

edited

Loading