Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend -linkonce-templates by matching template emission scheme #3600

Merged
merged 5 commits into from
Nov 27, 2020

Conversation

kinke
Copy link
Member

@kinke kinke commented Oct 25, 2020

[In essence, a revised version of the remaining stuff from #3422.]

I.e., define templated symbols in each referencing compilation unit (=> object file) when using discardable linkonce_odr linkage, analogous to C++.

This makes each compilation unit self-sufficient wrt. templated symbols, which also means increased opportunity for inlining and less need for LTO. There should be no more undefined symbol issues caused by buggy template culling.

The biggest advantage is that the optimizer can discard unused linkonce_odr symbols early instead of optimizing and forwarding to the assembler. So this is especially useful with -O to decrease compilation times and can at least in some scenarios greatly outweigh the (potentially very much) higher number of symbols defined by the glue layer.

Libraries compiled with -linkonce-templates can generally not be linked against dependent code compiled without -linkonce-templates; the other way around works.

@JohanEngelen
Copy link
Member

@kinke Let me know when this is ready to be tested on Weka's codebase.

@kinke
Copy link
Member Author

kinke commented Oct 25, 2020

It should be ready, first tests results looking good (benign failures after a quick glance). [It currently defaults to -linkonce-templates for testing.]

@kinke
Copy link
Member Author

kinke commented Oct 26, 2020

Sorry, there was still a problem for unreferenced symbols like module ctors, fixed now. Only 2 failing tests here; lit-test codegen/array_equals_memcmp_dyn.d fails because of discarded (after inlining) function being checked in asm output, i.e., clearly benign, and druntime integration test profile, which is probably related to #2538.

@kinke kinke force-pushed the linkonce2 branch 2 times, most recently from 8e46911 to 14a91e0 Compare October 31, 2020 14:27
@kinke kinke changed the title WIP: Extend -linkonce-templates by matching template emission scheme Extend -linkonce-templates by matching template emission scheme Oct 31, 2020
I.e., *define* templated symbols in each referencing compilation unit
when using discardable linkonce_odr linkage, analogous to C++.

This makes each compilation unit self-sufficient wrt. templated symbols,
which also means increased opportunity for inlining and less need for
LTO. There should be no more undefined symbol issues caused by buggy
template culling.

The biggest advantage is that the optimizer can discard unused
linkonce_odr symbols early instead of optimizing and forwarding to the
assembler. So this is especially useful with -O to decrease compilation
times and can at least in some scenarios greatly outweigh the
(potentially very much) higher number of symbols defined by the glue
layer.

Libraries compiled with -linkonce-templates can generally not be linked
against dependent code compiled without -linkonce-templates; the other
way around works.
This fixes a few test regressions caused by missing non-referenced
module ctors, e.g., inside a templated struct, which requires full
codegen of the whole StructDeclaration and where define-on-declare of
individual member functions etc. doesn't work.

Keep the existing logic to determine the actual root module the primary
instance will be assigned to.
The semantics were almost identical, except for DtoIsTemplateInstance()
checking the specified Dsymbol and its parents, while isInstantiated()
starts from its parent and ignores the symbol itself.

After a quick glance, we seem to have fed it with {Aggregate,Func,Var}
Declarations only, so should be fine with the default front-end variant.
@kinke
Copy link
Member Author

kinke commented Nov 17, 2020

I've enabled -linkonce-templates for all druntime and Phobos unittests, + dmd-testsuite-release. The unittests primarily because of the significant speed-up, and dmd-testsuite for some extra coverage. Note that -unittest has a deep impact on the normal template emission (each unique template instantiation is assigned to a root module, even if exclusively instantiated in non-root modules), so I don't think we really lose test coverage for the normal template emission this way.

Some rough timings (only 1 run) and static unittest lib sizes compared to master, using LLVM 11 without assertions on Win64 on my personal quad-core:

Master This (rebased)
dub 1m 4s 50s (-22%) with -linkonce-templates
phobos2-test-runner 6m 20s, 191 MB 4m 3s (-36%), 69 MB (-64%)
phobos2-test-runner-debug 2m 9s, 368 MB 2m (-7%), 339 MB (-8%)
druntime-test-runner 22s, 9.5 MB 19s (-14%), 6.2 MB (-35%)
druntime-test-runner-debug 14s, 21 MB 14s (±0%), 19 MB (-10%)

This is in line with some early tests in #3422. The fact that each unittest lib is smaller than before, including the debug ones without any optimization, shows that the normal template emission for -unittest is extremely conservative and yields more defined symbols than with new -linkonce-templates. Most likely because this scheme here (define-on-1st-declaration for templated symbols and each compilation unit) works on a symbol basis - i.e., if we are only referencing a method MyStruct!(...).foo(), only that function is defined, not the whole MyStruct!(...) struct instantiation.

@kinke kinke merged commit c155e3c into ldc-developers:master Nov 27, 2020
@kinke kinke deleted the linkonce2 branch November 27, 2020 01:45
@ibuclaw
Copy link
Contributor

ibuclaw commented Nov 27, 2020

Nice, maybe ping @atilaneves and maybe align this up with what he's doing for dmd?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants