Skip to content

Moved getAttributes from attrib.d to expressionsem.d#11788

Closed
AsterMiha wants to merge 2 commits intodlang:masterfrom
AsterMiha:attrib_sem
Closed

Moved getAttributes from attrib.d to expressionsem.d#11788
AsterMiha wants to merge 2 commits intodlang:masterfrom
AsterMiha:attrib_sem

Conversation

@AsterMiha
Copy link
Contributor

Taking semantic elements out of attrib.d

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @AsterMiha! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11788"

Copy link
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GDC needs access to getAttributes to implement @attribute. I guess LDC will break as a result of this too for other attributes (such as @fastmath). Am I right @kinke?

@thewilsonator
Copy link
Contributor

This still needs to be accessible from C++ for LDC (and I presume also GDC).

return true;
}

extern (C++) Expressions* getAttributes(UserAttributeDeclaration uad)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also needs to be in the headers somewhere too.

src/dmd/attrib.h Outdated
Copy link
Contributor

@RazvanN7 RazvanN7 Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the style is to have the * (star) right before the name of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the param is a pointer too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to cxx-unittest so that it is covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibuclaw do you mean src/tests/cxxfrontend.c or should I add it to test/unit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibuclaw do you mean src/tests/cxxfrontend.c or should I add it to test/unit?

cxxfrontend.c - just need to call the function from c++ to verify that mangling matches.

Dsymbol* syntaxCopy(Dsymbol* s);
Scope* newScope(Scope* sc);
void setScope(Scope* sc);
Array<Expression*>* getAttributes();
Copy link
Contributor

@RazvanN7 RazvanN7 Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to add it too here.

src/dmd/objc.d Outdated
Copy link
Contributor

@RazvanN7 RazvanN7 Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of UFCS[1] you leave this line unmodified.

[1] https://tour.dlang.org/tour/en/gems/uniform-function-call-syntax-ufcs

src/dmd/traits.d Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this line can be left unmodified.

@kinke
Copy link
Contributor

kinke commented Sep 23, 2020

Before diving into all these little details, how about questioning the point of this little refactoring? Moving getAttributes() from 1.5k-lines attrib.d to 12+k-lines expressionsem.d, at the cost of having to add a uad in almost every line (and sometimes multiple occurrences per line), just because the first call performs a semantic analysis?

@RazvanN7
Copy link
Contributor

Splitting semantic analysis from the ast nodes will ultimately lead to getting rid of the duplications in ASTBase.

The fact that expressionsem.d is huge is a sign that we should split it in multiple files, not that we should stop pulling out semantic routines from AST nodes.

@kinke
Copy link
Contributor

kinke commented Sep 23, 2020

So the end goal is to move every little function invoking sema to dedicated sema files? A quick grep shows still loads of occurrences in something like 40 files...

@RazvanN7
Copy link
Contributor

Out of the AST nodes, yes.

@Geod24
Copy link
Member

Geod24 commented Sep 23, 2020

How will that make DMD usable as a library ?

@RazvanN7
Copy link
Contributor

RazvanN7 commented Sep 23, 2020

@Geod24 The initial phase is a clean-up one so that Mihaela can accommodate with the codebase and then we will tackle the interface.

I think that having the separation helps making the code base cleaner anyway.

@Geod24
Copy link
Member

Geod24 commented Sep 24, 2020

@Geod24 The initial phase is a clean-up one so that Mihaela can accommodate with the codebase and then we will tackle the interface.

I think that having the separation helps making the code base cleaner anyway.

"Clean code" can be a very subjective criteria.
In my experience, if you are designing a library, you either have a lot of experience in the problem domain, or have use case to drive you. Since Mihaela doesn't have the former, I was expecting to see the latter: a couple of use cases being gathered and worked on. But so far I've only heard of questionable cleanups, no use case. Designing a library for the sake of designing a library will almost always miss the point without use cases.

I see the the milestones come from dlang/project-ideas#68 . I highly suggest you rethink it around use cases. One metric should be: "If we stop the project at step X, how usable will the result be?". We already have good experience with ASTBase: it's been around for 4 (5?) years and, despite leading to constant pain (and being broken for a while), it hasn't brought any tangible benefit. That's very low on the usable scale. On the other hand, some people with use cases (@rainers @jacob-carlborg ) have pushed for changes which are quantifiable enhancements, such as the diagnostic reporter.

There's plenty of use cases available out there. One such is VisualD. There are a few points which are very clearly needed in all projects, require a lot of work, and will bring net benefits, such as adding start and end locations, or making the AST non-destructive. The former is an easy problem, the latter is a hard one, but progress in any of those means progress that won't be lost if the project is not completed and someone else has to pick it up next year.

@WalterBright
Copy link
Member

Designing a library for the sake of designing a library will almost always miss the point without use cases.

Having designed many libraries myself, I can say "almost always" is actually "certainly".

@RazvanN7
Copy link
Contributor

@Geod24 Thanks for the insightful comment. I think that starting with adding start and end locations for AST nodes could be a good start.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Sep 24, 2020

@RazvanN7 Here's a list of problems I've encountered when trying to use DMD as a library (these are actual problems I ran into trying to use the library to write a useful tool) :

  • There's a lot of global state in the compiler. This might need to be reset when using the compiler in an IDE/editor LSP server
  • There's no end location for tokens or AST nodes. This will make source code refactorings difficult/impossible
  • Some AST nodes don't have location information at all
  • Location information does not contain a buffer position/offset
  • It's not possible to analyze source code that is only in memory. Imports and string imports will be looked up on disk
  • The error handling/reporting is difficult to customize
  • Information is lost between the various compilation phases
  • The semantic analyzer is destructive

I think that starting with adding start and end locations for AST nodes could be a good start.

I think it would be even simpler to start with the tokens. Here's a list of related tasks (in order of difficulty, with the easiest one first) that will improve the possibility for source code refactorings (this is needed for some LSP features and other tools). I think this list is a good start, because most work will confined to the lexer and parser:

  • Add buffer position/offset to all locations. For the tests, I think it should be enough to only verify that the tokens has the correct buffer position. For the AST nodes, the parse will get the locations from the tokens. I think this is a one-line change in the lexer. Most work for this task would be to write tests.

  • Add start locations to all AST nodes which are missing start locations. I think this is fairly easy since most AST nodes already have a start location. IIRC, UDA nodes do not have a start location. For example:

    @("foo")
    void bar();

    The first location in the line containing the UDA, will point to the start position of the string, not the @ symbol. The difficult part would be to find the nodes without a start location.

  • Add token length. This would be how many bytes the token occupy in the source code. I think it's enough to know the length of a token rather than the end location. I think this task is more straightforward than the above task, but it might take longer. Again, most work here is probably writing the tests.

  • Analyzing source code only from memory. I think this should be quite easy as well

  • Add end offset to all locations. This will require quite a bit of work for writing tests, since there are many different kinds of AST nodes

  • A way to get the related tokens an AST node consists of. I'm not sure if there need to be any changes to the compiler for this, or if it's enough to just re-lex the source code range the AST node occupy. For this the buffer position and end location is necessary.

I think the rest of the above mentioned problems are more difficult to tackle.

When it comes to testing of these tasks, I think best way is to write unit test style tests, like those available here [1]. There's no point in running the full compiler, including semantic analysis and code generation to only test the lexer or parser. There are already directories for the lexer and parser.

[1] https://github.com/dlang/dmd/tree/master/test/unit

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Sep 24, 2020

@Geod24 Regarding dlang/project-ideas#68. I think the last two tasks:

  • Develop an interface that is capable of overwriting any part of the compiler
  • Integrate dmd as a lib in existing tools

Are useful and have use cases. I have a use case for the first one in the above list. But I don't think they are realistic to achieve in the available time. I don't know if the other tasks are prerequisite for the above two tasks.

@ibuclaw
Copy link
Member

ibuclaw commented Sep 24, 2020

How will that make DMD usable as a library ?

This question still makes me wince. If gdc and ldc haven't been using dmd as a library for all these years now, how are we using it?

@AsterMiha
Copy link
Contributor Author

@jacob-carlborg Thank you for the detailed suggestions! I will start following this plan from now on.

* Add buffer position/offset to all locations.

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found? I would assume it's the second case since the first one sounds more like the start position.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Sep 30, 2020

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found?

@AsterMiha In DMD there's a type called Loc. It's used to store a location of an AST node or a token. Currently this type contains the filename, the line number and the column number. The goal would be to add the byte offset to this type. That means, the number of bytes from the start of the source file to the start of the first byte of the location.

This is required to be able to extract the exact code range from the source file a given AST node or token occupies. This is needed to be able to do the source code refactoring.

Here's an example of a source file:

foo "bar"

The above source file consists of two tokens. First an identifier, then a string literal. The byte offset of the second token, the string literal, will be 4 (first byte starts at offset 0).

first one sounds more like the start position

Yes, it is the start position. DMD already has the start position, but in other units (line number and column number). We need it in number of bytes.

I added an explicit item for end locations in my previous post: #11788 (comment).

@benjones
Copy link
Contributor

But first: could you please clarify if this position refers to the character count in the file or to how many tokens have been found?

@AsterMiha In DMD there's a type called Loc. It's used to store a location of an AST node or a token. Currently this type contains the filename, the line number and the column number. The goal would be to add the byte offset to this type. That means, the number of bytes from the start of the source file to the start of the first byte of the location.

This is required to be able to extract the exact code range from the source file a given AST node or token occupies. This is needed to be able to do the source code refactoring.

Here's an example of a source file:

foo "bar"

The above source file consists of two tokens. First an identifier, then a string literal. The byte offset of the second token, the string literal, will be 4 (first byte starts at offset 0).

first one sounds more like the start position

Yes, it is the start position. DMD already has the start position, but in other units (line number and column number). We need it in number of bytes.

I added an explicit item for end locations in my previous post: #11788 (comment).

I suspect you'll get push back about adding this to Loc since it's used so frequently that increasing its size could have noticeable memory impact. I'm not saying don't do it, but be aware.

@MoonlightSentinel
Copy link
Contributor

I suspect you'll get push back about adding this to Loc since it's used so frequently that increasing its size could have noticeable memory impact. I'm not saying don't do it, but be aware.

That's a valid concern which can be resolved by hiding the end location behind an appropriate version (as done e.g. for the callback API)

@RazvanN7
Copy link
Contributor

I will close this since the focus has changed to a different strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments