Skip to content

Conversation

@WalterBright
Copy link
Member

I have given up complaining about importing std.stdio (and others) and getting a vast object file of bloat from templates that were instantiated simply by the import, and decided it was a dmd problem. This change addresses it by not generating code from functions that were instantiated by templates from imports, if the standalone import instantiates those same templates.

This should also speed up compiles.

The semantic analysis is still done. In the future I'd like to address this by going "full lazy" on the semantic analysis of imports (currently it is done "full eager"). Such will also produce a big compile time speedup.

This change has a risk of my having missed cases, the symptom of which will be undefined symbols at link time.

This only deals with functions. If it works out, we can extend it to not generating data, either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, Module::root field is not used properly. Instead, you can check whether a module m is a root by using m->importedFrom == m.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll deal with this after it passes the test suite, maybe refactor things a bit.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 12, 2013

In TypeAArray::getImpl(), to fix test/runnable/test8997.d` failure:

#endif
         // Instantiate on the root module of import dependency graph.
         Scope *scx = sc->push(sc->module->importedFrom);
+        scx->instantiatingModule = sc->module->importedFrom;
         ti->semantic(scx);
         ...

@9rnsr
Copy link
Contributor

9rnsr commented Sep 12, 2013

I think that this PR is a right direction.

@WalterBright
Copy link
Member Author

In TypeAArray::getImpl(), to fix test/runnable/test8997.d` failure:

Thanks, @9rnsr, I thought it would likely be something like that but had been procrastinating narrowing it down. I just uploaded your fix.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 12, 2013

@WalterBright, during I've investigated around template semantics, I got one question about instantiation and codegen strategy.

When two modules exist like follows:

module a;
import b;
void main() { A!().foo(); }
module b;
private alias B!() ti_b;
template A() { void foo() { B!().bar(); } }
template B() { void bar() {} }

dmd a.d does not invoke codegen for B!() and it will cause link-failure, because it is explicitly instantiated in the non-root module b.
When I remove alias B!() ti_b;, the same command line will succeed to compile&link.

Obviously, libaray code could have such a private alias, e.g. for code readability. However currently it would change the compilation behavior on library user side.

I'd like to ask you, is this designed behavior, or a problem of unfinished dmd implementation?

@WalterBright
Copy link
Member Author

It's designed to work that way. The scheme relies on the idea that somewhere, module b is going to be compiled as a root module, which will instantiate B!(), and then b.obj will be linked in.

@don-clugston-sociomantic
Copy link
Contributor

Good stuff! There's incredible potential in this direction.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 12, 2013

Not good... In more complicated case, the design would cause (mostly) unintended link-failure.

module a;
import b;
import c;
void main() { A2!().foo(); }
module b;
import c;
alias a1 = A1!();
template A1() { alias b = B!(); }
module c;
template A2() { void foo() { B!().bar(); } }
template B() { void bar() { C!().baz(); } }
template C() { void baz() { } }
$ dmd a.d
DMD v2.064 DEBUG
OPTLINK (R) for Win32  Release 8.00.12
Copyright (C) Digital Mars 1989-2010  All rights reserved.
http://www.digitalmars.com/ctg/optlink.html
a.obj(a)
 Error 42: Symbol Undefined _D1c6__T1BZ3barFNbZv
--- errorlevel 1

A1!() and A2!() looks distinct instantiations, but the shared instance B!() would be undefined symbol.

The critical issue is that the link-failure occurrence depends on the semantic analysis order. When you remove import b from module a, dmd a.d will succeed to compile.

Walter, I and you have much knowledge about compiler implementation, therefore we can understand the behavior. But it would be impossible for most D programmers.
I think such the difficult-to-understand behavior would give bad effect to D experience.

@WalterBright
Copy link
Member Author

The error is in:

dmd a.d

because nowhere in that command was b.d and c.d compiled and linked in. This PR relies on at some point the programmer is going to compile all the modules and link them in, not just 'import' them.

@mihails-strasuns
Copy link

After few months wasted on experiments in that direction it is great relief to see someone knowledgeable to pay attention to this problem :)

I have some concerns about this change in regard of separate compilation though, testing right now.

@mihails-strasuns
Copy link

@WalterBright This fails:

module a;

import b;

void main()
{
    boo(1);
    foo();
}
module b;

public import c;

auto foo()()
    if (boo(1))
{
    return 1;
}
module c;

auto boo(T)(T t)
{
    return 1;
}
$ dmd -c a.d
$ dmd -c b.d
$ dmd -c c.d
$ dmd  a.o b.o c.o
a.o: In function `_Dmain':
__entrypoint.d:(.text._Dmain+0xa): undefined reference to `_D1c10__T3booTiZ3booFNaNbNfiZi'

@mihails-strasuns
Copy link

P.S.
@eskimor You should be interested in this pull request too.

@ghost
Copy link

ghost commented Sep 12, 2013

Here's a few tests, with build times and static library/object file size printouts:

  • Simple test file:
import std.stdio;
void main() { }

2.063.2:

400 msecs
test.obj: 55 KB
test.exe: 310 KB

2.064 git-head:

980 msecs
test.obj: 92 KB
test.exe: 447 KB

Pull 2550:

950 msecs
test.obj: 8 KB
test.exe: 438 KB
  • Now with something more heavy, like WindowsAPI:

2.063.2:

21_450 msecs
dmd_win32_x32.lib: 7_081 KB

2.064 git-head:

12_155 msecs
dmd_win32_x32.lib: 4_389 KB

Pull 2550:

12_152 msecs
dmd_win32_x32.lib: 4_388 KB
  • Well, that one doesn't have a lot of templates. Let's try my minilib library instead (which can only be built with 2.064+):

Without unittests:

2.064 git-head:

2808 msecs
677 KB static lib

Pull 2550:

1673 msecs
528 KB static lib

With unittests:

2.064 git-head:

7900 msecs
2_829 KB static lib

Pull 2550:

7800 msecs
2_829 KB static lib
  • Testing DCollections:

2.063.2:

430 msecs
217 KB static lib

2.064 git-head:

1_071 msecs
345 KB static lib

Pull 2550:

1_008 msecs
194 KB static lib
  • Testing GtkD:

2.063.2:

34_688 msecs
9.391 KB static lib (gtkd.lib)

2.064 git-head and Pull 2550 - Can't be built, I get:

src\gtkc\gtktypes.d(1005): Error: enum gtkc.gtktypes.GtkRcTokenType
overflow of enum value cast(GtkRcTokenType)
cast(GTokenType)270

I hope someone can fix this so we can test GtkD against this pull, GtkD was why the initial bug report was made.

  • Testing compiling one file in GtkD (See Comment 3) with dmd -c -O -Isrc src/cairo/Types.d:

2.063.2:

491 msecs
54 KB object file (Types.obj)

2.064 git-head:

1_168 msecs
89 KB object file (Types.obj)

Pull 2550:

980 msecs
8 KB object file (Types.obj)

Notice that the average build times when using 2.064+ are twice as slow than using 2.063.2, except for the outlier WindowsAPI which is much faster in 2.064+. This is Issue 10866 which I've filed a while ago. Even with all the recent optimizations in the compiler we've still made 2.064 slower for building many libraries (this is unrelated to this Pull 2550).

However the object files do seem to be smaller with this pull.

@ibuclaw
Copy link
Member

ibuclaw commented Sep 12, 2013

Here's a few tests, with build times and static library/object file size printouts:

It does look worrying that even with this pull, 2.064 compiles 2x slower than 2.063.2...

@WalterBright
Copy link
Member Author

It does look worrying that even with this pull, 2.064 compiles 2x slower than 2.063.2...

Yes, it does. I'd like to see this as a separate bugzilla issue. At least bisect needs to be done to see what caused it.

@WalterBright
Copy link
Member Author

@Dicebot fixed and added test case.

@ghost
Copy link

ghost commented Sep 12, 2013

At least bisect needs to be done to see what caused it.

Yeah I'm bisecting right now.

@ghost
Copy link

ghost commented Sep 12, 2013

I've reduced it to a Phobos commit:

9a053d97c26d763718b23d14af8ef1d10d594507 is the first bad commit
commit 9a053d97c26d763718b23d14af8ef1d10d594507
Author: Dmitry Olshansky <dmitry.olsh@gmail.com>
Date:   Sat Jul 20 22:30:10 2013 +0400

    new std.uni module

If you check out the following commits for DMD+Druntime+Phobos:

DMD: 4e9a7b1
Druntime: f6692fe
Phobos: 9a053d9

Then compiling the following:

import std.stdio;
void main() { }

Takes ~1270 msecs. If you checkout one Phobos commit earlier (which is 1bd22b2), it takes ~500 msecs to compile it.

@ghost
Copy link

ghost commented Sep 12, 2013

Btw I can also reproduce this commit is the offending one with other libraries, like DCollections. I'm pretty sure most libraries import std.stdio at one point or another, so they will all suffer from this slowdown.

@eskimor
Copy link

eskimor commented Sep 12, 2013

Thanks @Dicebot . The new "instantiatingModule" looks promising for fixing: http://d.puremagic.com/issues/show_bug.cgi?id=9948 & the linking issues you are working on. I'll check it out tomorrow.

@WalterBright
Copy link
Member Author

Thank you, @AndrejMitrovic , can you please post your findings to a Bugzilla issue as a regression?

@ghost
Copy link

ghost commented Sep 12, 2013

@WalterBright
Copy link
Member Author

Thank you, that'll do nicely.

@WalterBright
Copy link
Member Author

Let's merge! Pull away, me hearties!

@ghost
Copy link

ghost commented Sep 12, 2013

This is exactly the type of pull that I would make in a different branch, then posted a binary of it online, posted in the newsgroup saying "Please try this new experimental compiler build, it's much faster", and receive more bug reports and usage reports for several weeks before merging it into master.

But then again I'm not the puppet-master here. Let's do it the old fashioned way and break people's code, let them receive Optlink errors.

@WalterBright
Copy link
Member Author

Anyone pulling HEAD has to expect the possibility of breakage. We need this pull, and the only practical way to get some mileage into it before release is to put it in HEAD.

@mihails-strasuns
Copy link

@WalterBright
Can you wait with merging ~1 more day? Test suite currently does not cover separate compilation cases at all and those are most likely to suffer from object file emitting changes. I am currently running my own tests and checking symbol lists with nm manually, but that takes some time.

@WalterBright
Copy link
Member Author

@Dicebot sure, no problem.

@braddr
Copy link
Member

braddr commented Sep 12, 2013

@Dicebot not entirely true. There's several tests that use separate compilation w/in the dmd test suite: grep -ir separate test//

(I forget the exact tag used).

@WalterBright
Copy link
Member Author

After looking template.c line 5449 seems to be doing the opposite, changing the instantiating module if the original was already a root....

The idea is that if an imported module generates the instantiation, then we don't have to generate it for a root module. So, if the instantiator is already listed as a root, and the new instantiator is not root, switch it to the new instantiator. The logic can be shortened to remove the second clause.

@WalterBright
Copy link
Member Author

I was thinking about a different approach to eliminate symbols only used during compilation: mark functions/variables that are actually in the original source. Before code generation traverse the AST to mark all symbols actually referenced, and only generate code/data for marked symbols.

My plan for that, as mentioned earlier in this thread, is to do semantic analysis lazily rather than eagerly.

@Orvid
Copy link
Contributor

Orvid commented Sep 14, 2013

The issue with that is that if a template is instantiated in both an imported module and a root module, and they both import each other, both will determine that the other is supposed to be the one generating the instantiation, and it will never be generated. The solution to that is to have both generate it.

A test that should cause the issue:

Module 1:

module a;

import b;

struct SomeStruct(T)
{
    T field;
    T getInnerField()
    {
        return field;
    }
}

immutable globalField = SomeStruct!string("Hello!");

void main()
{
    globalField.getInnerField();
    anotherGlobalField.getInnerField();
}

Module 2:

module b;

import a;

immutable anotherGlobalField = SomeStruct!string("Hello Again!");

Then compile them with:

dmd -c a.d
dmd -c b.d
dmd a.o b.o

That should produce an undefined reference to SomeStruct!string.getInnerField().

@WalterBright
Copy link
Member Author

For example, this code from ufcs.d in the testsuite

I can't duplicate any error with that, but of course, I am not using the precise gc.

@WalterBright
Copy link
Member Author

Thank you, @Orvid , that makes sense. I'll figure out a fix. I can duplicate the error, thanks also for the nice test case.

@WalterBright
Copy link
Member Author

@rainers
Copy link
Member

rainers commented Sep 15, 2013

My plan for that, as mentioned earlier in this thread, is to do semantic analysis lazily rather than eagerly.

That won't help in throwing out the code that was semantically analyzed during contraint checking and CTFE.

@rainers
Copy link
Member

rainers commented Sep 15, 2013

I can't duplicate any error with that, but of course, I am not using the precise gc.

Actually, using the druntime patches for the precise GC seems not enough, it probably also needs patches for better RTInfo (#2480).

@mihails-strasuns
Copy link

That won't help in throwing out the code that was semantically analyzed during contraint checking and CTFE.

Also it won't help to eliminate one of most most problematic sources of code bloat in template-abusing programs - unused bodies of template symbols that got inlined everywhere. I guess it can't be done without some feedback from backend to frontend.

@Orvid
Copy link
Contributor

Orvid commented Sep 15, 2013

Another way that could decrease the final size is something that no compiler I know of does currently, and would definitely need to have it's own switch, and that's merging duplicate method bodies. It would of course be a backend change, because you'd be comparing the hash of the generated assembly code, and simply refer to only one of the generated bodies, never actually writing the duplicates to file. The only thing is, it's a bit more complicated than it would sound at first, due to the fact you have to account for any relative addresses referenced in the method that point to a location outside of the bod of the method. You'd also have to account for the differences in absolute addresses generated for call instructions to the current method. (methods that call themselves)

@WalterBright
Copy link
Member Author

@Orvid I actually have sort of a design for that, but it's in the front end.

@MartinNowak
Copy link
Member

Another way that could decrease the final size is something that no compiler I know of does currently, and would definitely need to have it's own switch, and that's merging duplicate method bodies.

Identical COMDAT folding

@Orvid I actually have sort of a design for that, but it's in the front end.

Shouldn't that belong into the linker?

The idea is that if an imported module generates the instantiation, then we don't have to generate it for a root module. So, if the instantiator is already listed as a root, and the new instantiator is not root, switch it to the new instantiator. The logic can be shortened to remove the second clause.

I still find it problematic, that we generate different objects for dmd -c a b and dmd -c a && dmd -c b.
The former allows for huge time savings because it avoids reparsing common modules, but we have to make it work correctly. Clearly the C++ toolchain indicates, that every instantiating object should get a copy of the definition.
If generating multiple copies is huge performance issue, copying in the glue layer or at link time (partial linking) could save a lot of the overhead.

@mihails-strasuns
Copy link

Looks like this has broken vibe.d:

import vibe.d;

void handleRequest(HTTPServerRequest req, HTTPServerResponse res)
{
}

shared static this()
{
    listenHTTP(new HTTPServerSettings(), &handleRequest);
}

on master results in

$ rdmd -L-levent -L-lcrypto -L-lssl -release -O -inline -J. -J./views/ -J../views/ -I/usr/share/vibed/source -version=VibeLibeventDriver source/app.d 
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../lib/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/tmp/.rdmd-1000/rdmd-app.d-9069AA21EB13E759B2C02C48539EC93D/objs/app.o: In function `_D4vibe4inet7message13decodeMessageFxAhAyaZAya':
/usr/share/vibed/source/deimos/openssl/ecdh.d:(.text._D4vibe4inet7message13decodeMessageFxAhAyaZAya+0x18b): undefined reference to `_D3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder8popFrontMFNaZv'
/tmp/.rdmd-1000/rdmd-app.d-9069AA21EB13E759B2C02C48539EC93D/objs/app.o: In function `_D3std6base6424__T10Base64ImplVa43Va47Z153__T7decoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7decoderFNaS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZS3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder':
/usr/share/vibed/source/deimos/openssl/ecdh.d:(.text._D3std6base6424__T10Base64ImplVa43Va47Z153__T7decoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7decoderFNaS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZS3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder+0x3d): undefined reference to `_D3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder6__ctorMFNaNcS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZS3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder'
/tmp/.rdmd-1000/rdmd-app.d-9069AA21EB13E759B2C02C48539EC93D/objs/app.o:(.data._D211TypeInfo_S3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder6__initZ+0x38): undefined reference to `_D3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7Decoder11__xopEqualsFKxS3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7DecoderKxS3std6base6424__T10Base64ImplVa43Va47Z153__T7DecoderTS4vibe4inet7message13decodeMessageFxAhAyaZAya79__T12FilterResultS544vibe4inet7message13decodeMessageFxAhAyaZAya9__lambda3TAxhZ12FilterResultZ7DecoderZb'
collect2: error: ld returned 1 exit status
--- errorlevel 1

Trying to reduce (no idea how to do it properly).

That may also be any other recent change to symbol emitting, as bisecting can't be run because earlier vibe.d release was subject to http://d.puremagic.com/issues/show_bug.cgi?id=11062

@MartinNowak
Copy link
Member

That may also be any other recent change to symbol emitting, as bisecting can't be run because earlier vibe.d release was subject to http://d.puremagic.com/issues/show_bug.cgi?id=11062

The inline ICE doesn't happen without -inline, so you can try that for bisecting.

@mihails-strasuns
Copy link

@dawgfoto good catch, thanks, will try it

@ghost
Copy link

ghost commented Oct 22, 2013

Now that Issue 10866 has been resolved, here are some new timings and size comparisons:

Hello-world:

2.063.2:

392 msecs - 55 KB object file

2.064 git-head:

606 msecs - 6 KB object file

WindowsAPI:

2.063.2:

20_000 msecs - 7,082 KB static lib

2.064 git-head:

7708 msecs - 4,361 KB static lib

DCollections:

2.063.2:

420 msecs
217 KB static lib

2.064 git-head:

594 msecs
194 KB static lib

GtkD:

2.063.2:

37_711 msecs
gtkd.lib - 9,623 KB
gtkdgl.lib - 206 KB
gtkdsv.lib - 486 KB

2.064 git-head:

31_519 msecs
gtkd.lib - 9,571 KB
gtkdgl.lib - 150 KB
gtkdsv.lib - 425 KB

leandro-lucarella-sociomantic pushed a commit to leandro-lucarella-sociomantic/dmd that referenced this pull request Oct 31, 2013
This fix is based on the just merged pull request:

 dlang#2550

by Walter Bright. It made this fix trivial (at least the test case in the bug report is working correctly now).
@monarchdodra
Copy link
Contributor

Hi guys. I think this caused a regression:

https://d.puremagic.com/issues/show_bug.cgi?id=11863

Code works fine in CTFE, but fails later at runtime. If the code is copy pasted in place (toImpl => toImpl2), and the copy is called directly, it starts working again. This implies that where the code is first instanciated means it has a different behavior. I all seems like a DMD bug, and this seems related, and a bissect flagged it as the pull that introduced the behavior.

Can you give it a look?

@9rnsr
Copy link
Contributor

9rnsr commented Jan 7, 2014

https://d.puremagic.com/issues/show_bug.cgi?id=11863

I finally discovered codegen bug in glue layer.

#3069

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.