-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with a more straight-forward template emission scheme #3422
Conversation
I've quickly experimented a bit wrt. this, building dub (as presumably Phobos-heavy code) in the repo root dir on Win64 with
Results:
All 3 versions can be linked with the corresponding druntime/Phobos; sizes are almost identical (~3.3M; stripping is enabled by default), and the time required by the MS linker doesn't really matter here (~320ms for |
I'll try and test this on weka's codebase this week. (lots of templates and separate compilation) |
I guess I only got 'lucky' with dub when compiling to a single .obj; compiling the defaultlibs with this (=> separate objects per module) shows that it's absolutely not working, so at least the 2nd diff wrt. Quoting Iain from Slack:
|
I've experimented a bit to get a better understanding of how this currently works. Let's use this simple structure: // --- multi_a.d:
import multi_b;
import multi_c;
void templ_a()() {}
void foo_a()
{
templ_b();
templ_c();
}
// --- multi_b.d:
import multi_c;
import multi_a;
void templ_b()() { templ_c(); }
void foo_b()
{
templ_a();
templ_c();
}
// --- multi_c.d:
void templ_c()() {}
Observations when compiling the 3 modules to separate object files (
For the example above, this means that with master, What I'd like to experiment with would be a C++-like emission strategy, emitting all directly and indirectly instantiated templates once per instantiating module (end goal: object file). I.e.:
Then we should be able to use I've pushed a little sketch, which adds more TemplateInstances as module members (corresponding to the goal above for the little example). It doesn't work yet of course; a simple |
Pinging @ibuclaw. |
fyi: for Weka's large templated codebase, the depth of template instantiations can be very deep, >1000. Determining whether the template needs to be instantiated by |
With the sketch, |
@dnadlinger et al.: Any ideas how to tackle this? [This would also be interesting for proper cross-module inlining when generating multiple object files at once.] |
a948851
to
19e1b4d
Compare
Multiple codegen (of the primary TI) seems to be solved now; druntime and Phobos can be compiled, both all-at-once and each module separately. For the little example, But as the countless undefined symbols show, a whole bunch of TIs still seems missing. I've tried without |
Okay, this utterly fails: bool foo(void[] a, void[] b)
{
return a == b;
} Putting this into 2 identical files and compiling them both at once to separate object files, only one object gets the full tree of required TIs:
The other object only gets (a secondary TI of) the top |
Pushed another little experiment, trying to add the nested TIs to the parent TI members instead of the module members. Works for the trivial example above wrt. |
With a Phobos hello-world on Win64, 25 undefined symbols remain (before linker stripping), so it looks like this is slowly getting somewhere. - A TI may now be present multiple times in the members tree of a single module (nested in different parent TIs and/or also added as top-level TI to the module directly) which is obviously suboptimal. We do skip multiple function definitions in |
…g module I.e., include children TIs in speculative or non-root modules too, because a non-primary sibling of the parent TI, in a root module, requires all children.
The TIs making it here are either top-level TIs in some root module, or TIs nested therein (possibly in a non-root primary instance though; these have been skipped before).
I've taken a different route now, inspired by cross-module inlining, and codegen the template functions now whenever they are IR-declared instead of tampering with Module and TemplateInstance members. That's both much simpler and much more promising according to the remaining undefined symbol linker errors (an empty D main now links fine on Windows, a The total size of all druntime/Phobos release object files on Win64 goes down from ~10.5 MB (master) to 8.5 MB with |
I guess that a new enum is required to denote each template emission strategy - judging from the changes here, they'd be called something like allOwned (dmd default), allInst (-allinst), and allSpeculative (this pull)? |
A problem is that the emission strategy should be consistent for all static libraries and objects of a linked project, as e.g. druntime/Phobos compiled with this cannot be linked against user code with the current emission strategy. |
OK.
But if this is done right, there won't be a need to give the user a choice over which emission strategy to use. |
FWIW, I'd suggest that what actually matters are compile/link times of typical (moderately template-heavy) user applications instead of object file size – Phobos object file size might be a proxy for the typical number of duplicate templates, but I'm not sure how good of one it makes. The current strategy (at least last time I looked it) is based on the assumption that code generation is expensive – especially for optimised builds –, and duplicate codegen is thus best avoided. As Johan pointed out earlier, the I haven't had a chance to look at the latest changes here yet, but it seems like the basic idea is to emit enough duplicates to be able to use Is this basically DMD |
Phobos hello-world now links on Windows, even the druntime testrunners (debug + release). For the Phobos debug testrunner, 199 undefined symbols (the vast majority of which confined to |
Why is weak_odr pinning for special symbols necessary? |
Because I don't emit them into each referencing CU (yet?). If a root module instantiates some class template, it's going to be codegen'd in |
Why treat class templates differently to others, though? (It might of course be that the usage structure between class and function templates differs enough in typical D that one strategy is better for classes and the other for free functions.) |
My primary focus is getting this green first, to be able to conduct some performance tests. Then the details can be ironed out; the weak_odr template data hack has IMO probably a higher impact than the ClassInfos. |
Analogous to ClassInfos, incl. normal linkage (external for non- templates, weak_odr for templates). This enables to get rid of frontend logic wrt. whether to add TypeInfoStructDeclarations to a module's members tree - previously, it was defined as linkonce_odr in the owning module and each referencing module (unless speculative) - and related extra semantic and codegen for the special member functions. I've gone a bit further and moved the entire TypeInfo emission for LDC to the codegen layer; no TypeInfoDeclarations are added to the module members anymore. Whenever we need a TypeInfo symbol during codegen, it is declared or defined, and we don't need to rely on brittle frontend logic with speculative-ness complications. This might slightly increase compilation speed due to less emitted TypeInfos and functions (possibly less work for the linker too). A slight drawback is that the job of stripping unused struct TypeInfos is fully delegated to the linker, as the TypeInfo is guaranteed to end up in the owning object file due to no linkonce_odr. Another theoretical drawback is that the optimizer can definitely not inline xtoHash/xopEquals/xopCmp/xtoString/xdtor[ti]/xpostblit function pointer indirections in non-owning CUs without LTO (neither the pointers nor the special member functions are defined anymore). These (public) members are probably hardly used directly though, and instead used by the virtual TypeInfo_Struct methods equals/compare/ getHash/destroy/postblit, which are exclusively defined in druntime's object.o (incl. the TypeInfo_Struct vtable) and aren't cross-module- inlined anyway (without LTO). Re-emitting the struct TypeInfos (and optionally the special member functions too) into each referencing CU could be handled in our codegen layer, which should be much simpler and more robust than the upstream scheme.
Conflicts: dmd/dtemplate.d gen/declarations.cpp gen/typinf.cpp ir/iraggr.cpp ir/iraggr.h ir/irclass.cpp
Now down to 19 (apparently all data, no functions anymore):
|
Should be finally mostly working - the few remaining failures look benign (=> would require test adaptations due to changed codegen order and different IR). Some promising Azure CI timings for Mac (the most stable Azure platform wrt. runtimes), with enabled assertions (for LDC & LLVM):
|
On my Win64 box, I'm seeing a big improvement for building the (optimized release) Phobos test runner ( |
That's definitely in line with what I'd expect. I've been testing switching the default template linkage from vague to weak, and I've seen compilation times of a few phobos modules go from 1m 30s to just over 3m with |
As expected, the significantly increased number of IR definitions per IR module leads to a heavy compiler slow-down if the optimizer isn't run. So unsurprisingly no good for debug builds. For release builds, the impact on compile times probably varies a lot from project to project and how it's built (number of object files). As a bonus, template functions are automatically available for cross-module inlining, reducing the need for LTO. |
Finally superseded by #3600. |
No description provided.