Cache the result of FuncDeclaration.isTypeIsolated()... #12930

JohanEngelen · 2021-07-27T22:12:06Z

...because it is potentially a very expensive check.

This resulted in a large compile-time improvement on the large codebase at Weka.io (11min --> 3.5min).

Patch by @EyalIO !

dlang-bot · 2021-07-27T22:12:08Z

Thanks for your pull request and interest in making D better, @JohanEngelen! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#12930"

dkorpel · 2021-07-27T22:24:29Z

I don't like the addition of global variables. Have you tried caching based on .deco (interned string of the type's mangle) instead? Also, isTypeIsolated has a terrible time complexity currently. Can you try out how #12885 affects compile time of the large codebase at Weka.io?

JohanEngelen · 2021-07-27T22:55:46Z

I don't like the addition of global variables.

This is not an objective critism that I can remedy.

Have you tried caching based on .deco (interned string of the type's mangle) instead?

This won't work without string copying in Type copy(). A Type's mangle can be very large, so it's not fast to index an AA based on these strings.

Also, isTypeIsolated has a terrible time complexity currently. Can you try out how #12885 affects compile time of the large codebase at Weka.io?

I will try it, give me a few days.

(Nice to see you are using --ftime-trace and that you are putting it to good use! :-) hope you like the improvements of ldc-developers/ldc#3797 !)

thewilsonator · 2021-07-28T01:45:02Z

dmd/mtype.d(254:31)[warn]: Variable declaration for `nextUniqueId` has redundant attributes (`static`, `__gshared`).

dkorpel · 2021-07-28T08:54:58Z

This won't work without string copying in Type copy(). A Type's mangle can be very large, so it's not fast to index an AA based on these strings.

Why do you need to copy it? It's interned, meaning it's immutable and two decos are the same if and only if their pointers are the same. The pointer value of .deco is already a unique id AFAIK.

(Nice to see you are using --ftime-trace and that you are putting it to good use! :-) hope you like the improvements of ldc-developers/ldc#3797 !)

It's a useful feature, thanks for adding it!

RazvanN7 · 2021-07-28T09:22:57Z

@JohanEngelen : @dkorpel is right, the entire point of deco is to provide fast comparison of types. This makes it the faster alternative and gets rid of the need to add a new field.

kinke · 2021-07-28T14:51:08Z

11min --> 3.5min

!!!

JohanEngelen · 2021-07-28T16:58:30Z

@JohanEngelen : @dkorpel is right, the entire point of deco is to provide fast comparison of types. This makes it the faster alternative and gets rid of the need to add a new field.

How true is this?
Are Type objects never deleted? If they are deleted, then deco memory is freed, and could be used by another type's deco somewhere down the line leading to spurious miscompilation. (This is the reason why we did not use Type object's pointer)
Due to Type's copy() returning a new type with the same deco, I'm worried that there are cases where it's not uniqued but still a subtly different type.

JohanEngelen · 2021-07-28T17:01:05Z

11min --> 3.5min

!!!

You forgot one more exclam! Yeah, the impact is huge.

RazvanN7 · 2021-07-30T11:05:35Z

Due to Type's copy() returning a new type with the same deco, I'm worried that there are cases where it's not uniqued but still a subtly different type.

I'm fairly certain they are unique. The copy() functions returns a new type that has not gone through semantic analysis. During semantic analysis of the type itself, the uniqness is assured.

JohanEngelen · 2021-08-01T22:11:52Z

isTypeIsolated has a terrible time complexity currently. Can you try out how #12885 affects compile time of the large codebase at Weka.io?

Looks like we are down to 1m49 currently (just sema, no codegen).
With this PR: 1m49
With this PR + #12885: 1m49 (no change)
With only #12885: 2m07
Without any of this: 2m28 (apparently a lot of other compiletime improvements have landed)

Due to Type's copy() returning a new type with the same deco, I'm worried that there are cases where it's not uniqued but still a subtly different type.

I'm fairly certain they are unique. The copy() functions returns a new type that has not gone through semantic analysis. During semantic analysis of the type itself, the uniqness is assured.

I don't understand why this is being debated. We need a unique identifier for Type, that's what this PR adds in a clear way. Using deco (an unintelligible variable name) as a unique identifier might work, but that's not the way to write source code of a robust compiler.

JohanEngelen · 2021-08-01T22:13:07Z

With this PR + #12885: 1m49 (no change)

That's not to say the #12885 does not have value. If it reduces complexity, so much the better!

dkorpel · 2021-08-01T22:45:24Z

I don't understand why this is being debated. We need a unique identifier for Type, that's what this PR adds in a clear way. Using deco (an unintelligible variable name) as a unique identifier might work, but that's not the way to write source code of a robust compiler.

It's debated because re-inventing an existing mechanism in the compiler with a global variable adds complexity to the code. I agree that deco is poorly named and it should be documented, but adding new redundant code because the existing code is poorly understood is not the way to write a robust compiler either.

dkorpel · 2021-08-01T22:53:54Z

Looks like we are down to 1m49 currently (just sema, no codegen).
With this PR: 1m49
With this PR + #12885: 1m49 (no change)
With only #12885: 2m07
Without any of this: 2m28 (apparently a lot of other compiletime improvements have landed)

Thank you! This is very interesting, I expected the codebase would contain a few complex base classes that would take >1s for each isTypeIsolated check similar to the 800ms of dmd's Dsymbol, but considering it doesn't even give a second of improvement with caching, it suggests that your codebase is doing a few relatively small (1 - 100ms?) checks very often instead.

JohanEngelen · 2021-08-02T20:54:59Z

Looks like we are down to 1m49 currently (just sema, no codegen).
With this PR: 1m49
With this PR + #12885: 1m49 (no change)
With only #12885: 2m07
Without any of this: 2m28 (apparently a lot of other compiletime improvements have landed)

Thank you! This is very interesting, I expected the codebase would contain a few complex base classes that would take >1s for each isTypeIsolated check similar to the 800ms of dmd's Dsymbol, but considering it doesn't even give a second of improvement with caching, it suggests that your codebase is doing a few relatively small (1 - 100ms?) checks very often instead.

My "without any of this" was actually including the change in dcast.d which puts the isTypeIsolated check later in a series of checks, that probably also helped.

JohanEngelen · 2021-08-02T21:04:41Z

I don't understand why this is being debated. We need a unique identifier for Type, that's what this PR adds in a clear way. Using deco (an unintelligible variable name) as a unique identifier might work, but that's not the way to write source code of a robust compiler.

It's debated because re-inventing an existing mechanism in the compiler with a global variable adds complexity to the code. I agree that deco is poorly named and it should be documented, but adding new redundant code because the existing code is poorly understood is not the way to write a robust compiler either.

deco is not meant to be a unique identifier, even if it could be used as such. You can even see in the comment by Razvan and the behavior of copy that it is going to change value during sema. Very bad for a unique identifier.
Adding a "global" is the only way I know of getting a unique id (I've now made it private btw); let me know if it should be made threadsafe. Show me how else to do it and I'll change the code to it. Memory allocation (which is done for the string of deco) is very much not a guaranteed way to get a unique value.
I strongly disagree here. DMD is already terribly unstable code, I'm not going to add more badness to it.

dkorpel · 2021-08-02T21:16:22Z

My "without any of this" was actually including the change in dcast.d which puts the isTypeIsolated check later in a series of checks, that probably also helped.

Is the codebase compiled with -preview=dip1000? In that case, the change to dcast.d will help a ton because few functions are strongly pure usually. Otherwise, it shouldn't matter.

dkorpel · 2021-08-02T22:14:36Z

Memory allocation (which is done for the string of deco) is very much not a guaranteed way to get a unique value.

I see your point now, you don't want to rely on the fact that dmd interns .deco strings in Type.stringtable and never frees them, so the code keeps working even when the memory allocation scheme is changed. That's fair. I don't think you need to make it thread safe, dmd really isn't. I would consider resetting the global state in Type._init() for dmd.frontend:initDMD, and adding an assert(!__ctfe) to isTypeIsolated with a comment that it doesn't work in CTFE because it relies on a global counter.

WalterBright · 2021-08-03T06:04:47Z

src/dmd/mtype.d

@@ -247,6 +247,12 @@ extern (C++) abstract class Type : ASTNode
    MOD mod; // modifiers MODxxxx
    char* deco;

+    // Each Type gets a unique identifier stored in uniqueId during compilation (automatically set by constructor).


Each Type already gets a unique identifier - it's the .deco field.

WalterBright · 2021-08-03T06:13:01Z

deco is not meant to be a unique identifier, even if it could be used as such.

It absolutely is meant to be a unique identifier, and is used as such everywhere. For example, because these identifiers are unique, types can (and are) compared for identity by comparing the pointers to deco, never the contents of deco. deco also forms the basis of the type name mangling, which utterly relies on its uniqueness and the 1:1 mapping of types <=> deco. The compiler would fall apart if it wasn't unique.

WalterBright · 2021-08-03T06:23:13Z

I was looking at isTypeIsolated(), as I was unaware that it was in the compiler. Some problems with it:

Why isn't it part of the Type hierarchy? It doesn't seem to rely at all on being a member function of FuncDeclaration. It could be static!
It doesn't deal with TypeTuple or TypeDelegate parameters.
The name suggests TypeIsolated is a Type. It is not.

WalterBright · 2021-08-03T06:33:41Z

Looking at the implementation if isTypeIsolated(), there is:

const tName = t.toChars().toDString;

which is going to be disastrously slow, and consume lots and lots and lots of memory. It's then inserted into an associated array.
@JohanEngelen wrote: "A Type's mangle can be very large, so it's not fast to index an AA based on these strings." but that's exactly what is happening in isTypeIsolated(), and it's wasting time generating the string, too.

WalterBright · 2021-08-04T10:43:18Z

I suggest isIsolatedType instead of isTypeIsolated.

RazvanN7 · 2021-08-19T08:36:08Z

@JohanEngelen Are you now convinced that deco can be used as a unique identifier? If so, could you update this PR so that we can move forward?

maxhaton · 2021-09-17T23:39:20Z

@RazvanN7 @JohanEngelen ping pong

RazvanN7 · 2021-09-21T07:35:13Z

@maxhaton I will happily merge this once the comments have been addressed and the conflicts resolved.

JohanEngelen · 2021-09-21T10:32:05Z

haven;t gotten around to testing with the changes yet

JohanEngelen · 2021-09-29T20:32:25Z

@RazvanN7 @maxhaton I have tested using deco as a unique identifier and it resulted in the same object code binary on a large testcase. Also, compilation times were identical. I've rewritten the code to use deco as unique identifier and fixed the merge conflict with master.

…tentially a very expensive check.. This resulted in a large compile-time improvement on the large codebase at Weka.io (11min --> 3.5min).

maxhaton · 2021-09-29T21:19:25Z

@RazvanN7 @maxhaton I have tested using deco as a unique identifier and it resulted in the same object code binary on a large testcase. Also, compilation times were identical. I've rewritten the code to use deco as unique identifier and fixed the merge conflict with master.

What results are you getting now? This seems to yield about 5% on some informal testing for me, does that correlate? I haven't tried it on any internal company things but I imagine it'll be similar

JohanEngelen · 2021-09-29T21:21:50Z

@RazvanN7 @maxhaton I have tested using deco as a unique identifier and it resulted in the same object code binary on a large testcase. Also, compilation times were identical. I've rewritten the code to use deco as unique identifier and fixed the merge conflict with master.

What results are you getting now? This seems to yield about 5% on some informal testing for me, does that correlate? I haven't tried it on any internal company things but I imagine it'll be similar

I cannot test using dmd master (will take months before we get there, no joke).

Edit: and 5% is huge.

nordlow · 2021-09-29T22:40:09Z

This resulted in a large compile-time improvement on the large codebase at Weka.io (11min --> 3.5min).

Did you use any profiling tools to figure this out? If so, which?

JohanEngelen · 2021-09-30T11:12:20Z

This resulted in a large compile-time improvement on the large codebase at Weka.io (11min --> 3.5min).

Did you use any profiling tools to figure this out? If so, which?

The improvement is not so large anymore (codebase has evolved, see a later post in this thread here), but initially it was huge. I believe Eyal found the offending code using -ftime-trace (LDC), I don't know about the discovery of which exact function in dmd code took most time.

JohanEngelen · 2021-10-27T15:14:02Z

ping

JohanEngelen requested a review from ibuclaw as a code owner July 27, 2021 22:12

WalterBright reviewed Aug 3, 2021

View reviewed changes

dkorpel mentioned this pull request Aug 19, 2021

Put isReturnIsolated check after cheaper checks #12993

Merged

dlang-bot added Review:Needs Rebase Review:Needs Work labels Aug 20, 2021

JohanEngelen force-pushed the cache_isReturnIsolated branch from 153cf72 to 871b285 Compare September 29, 2021 20:31

dlang-bot removed Review:Needs Work Review:Needs Rebase labels Sep 29, 2021

Cache the result of FuncDeclaration.isTypeIsolated() because it is po…

d38cc83

…tentially a very expensive check.. This resulted in a large compile-time improvement on the large codebase at Weka.io (11min --> 3.5min).

JohanEngelen force-pushed the cache_isReturnIsolated branch from 871b285 to d38cc83 Compare September 29, 2021 21:05

RazvanN7 approved these changes Oct 27, 2021

View reviewed changes

RazvanN7 added the Merge:auto-merge label Oct 27, 2021

thewilsonator merged commit dfed61d into dlang:master Oct 28, 2021

JohanEngelen deleted the cache_isReturnIsolated branch October 29, 2021 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the result of FuncDeclaration.isTypeIsolated()... #12930

Cache the result of FuncDeclaration.isTypeIsolated()... #12930

JohanEngelen commented Jul 27, 2021

dlang-bot commented Jul 27, 2021

dkorpel commented Jul 27, 2021

JohanEngelen commented Jul 27, 2021

thewilsonator commented Jul 28, 2021

dkorpel commented Jul 28, 2021

RazvanN7 commented Jul 28, 2021

kinke commented Jul 28, 2021

JohanEngelen commented Jul 28, 2021

JohanEngelen commented Jul 28, 2021

RazvanN7 commented Jul 30, 2021

JohanEngelen commented Aug 1, 2021

JohanEngelen commented Aug 1, 2021

dkorpel commented Aug 1, 2021

dkorpel commented Aug 1, 2021 •

edited

Loading

JohanEngelen commented Aug 2, 2021

JohanEngelen commented Aug 2, 2021

dkorpel commented Aug 2, 2021

dkorpel commented Aug 2, 2021 •

edited

Loading

WalterBright Aug 3, 2021

WalterBright commented Aug 3, 2021

WalterBright commented Aug 3, 2021

WalterBright commented Aug 3, 2021

WalterBright commented Aug 4, 2021

RazvanN7 commented Aug 19, 2021

maxhaton commented Sep 17, 2021

RazvanN7 commented Sep 21, 2021

JohanEngelen commented Sep 21, 2021

JohanEngelen commented Sep 29, 2021

maxhaton commented Sep 29, 2021

JohanEngelen commented Sep 29, 2021 •

edited

Loading

nordlow commented Sep 29, 2021

JohanEngelen commented Sep 30, 2021

JohanEngelen commented Oct 27, 2021

Cache the result of FuncDeclaration.isTypeIsolated()... #12930

Cache the result of FuncDeclaration.isTypeIsolated()... #12930

Conversation

JohanEngelen commented Jul 27, 2021

dlang-bot commented Jul 27, 2021

Bugzilla references

Testing this PR locally

dkorpel commented Jul 27, 2021

JohanEngelen commented Jul 27, 2021

thewilsonator commented Jul 28, 2021

dkorpel commented Jul 28, 2021

RazvanN7 commented Jul 28, 2021

kinke commented Jul 28, 2021

JohanEngelen commented Jul 28, 2021

JohanEngelen commented Jul 28, 2021

RazvanN7 commented Jul 30, 2021

JohanEngelen commented Aug 1, 2021

JohanEngelen commented Aug 1, 2021

dkorpel commented Aug 1, 2021

dkorpel commented Aug 1, 2021 • edited Loading

JohanEngelen commented Aug 2, 2021

JohanEngelen commented Aug 2, 2021

dkorpel commented Aug 2, 2021

dkorpel commented Aug 2, 2021 • edited Loading

WalterBright Aug 3, 2021

Choose a reason for hiding this comment

WalterBright commented Aug 3, 2021

WalterBright commented Aug 3, 2021

WalterBright commented Aug 3, 2021

WalterBright commented Aug 4, 2021

RazvanN7 commented Aug 19, 2021

maxhaton commented Sep 17, 2021

RazvanN7 commented Sep 21, 2021

JohanEngelen commented Sep 21, 2021

JohanEngelen commented Sep 29, 2021

maxhaton commented Sep 29, 2021

JohanEngelen commented Sep 29, 2021 • edited Loading

nordlow commented Sep 29, 2021

JohanEngelen commented Sep 30, 2021

JohanEngelen commented Oct 27, 2021

dkorpel commented Aug 1, 2021 •

edited

Loading

dkorpel commented Aug 2, 2021 •

edited

Loading

JohanEngelen commented Sep 29, 2021 •

edited

Loading