Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC: Add a build profile extension with the link-time optimizer enabled #11856

Merged

Conversation

fkjagodzinski
Copy link
Member

@fkjagodzinski fkjagodzinski commented Nov 13, 2019

Description (required)

Summary of change (What the change is for and why)

Added an lto build profile extension for GCC_ARM toolchain.

  • added -flto to common flags,
  • updated tools/toolchains/gcc.py to use common flags at link time,
  • marked main() with MBED_USED attribute,
  • added -u main to ld flags.
Results

arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]

mbed-cloud-client-example@4.3.0, mbed-os-5.15.1, GCC9, Linux host

profile RAM (data + bss) Flash (text + data) build time rebuild time
release 60272 392449 141.33 10.86
release + lto 59432 356724 166.65 39.12

mbed-cloud-client-example@4.3.0, mbed-os-5.15.1, GCC9, Windows host

profile RAM (data + bss) Flash (text + data) build time rebuild time
release 60272 392449 1071.52 24.60
release + lto 59432 356724 1396.12 155.88
Build commands used to produce the above results
cd $(mktemp -d)
git clone git@github.com:ARMmbed/mbed-cloud-client-example.git
cd mbed-cloud-client-example/
git checkout 4.3.0 
git clean -dxff && git reset --hard
cp ../mbed_cloud_dev_credentials.c .

# LTO branch on top of mbed-os-5.15.0
echo 'https://github.com/fkjagodzinski/mbed-os/#a3dac1943f3770eb287c244da0a208cb4cf9b559' > mbed-os.lib

mbed deploy
cd mbed-os/ && git log --oneline -1 && cd ..
# Define Mbed OS profile
# 1. release
\time -f%e mbed compile -t GCC_ARM -m K64F --profile mbed-os/tools/profiles/release.json
# 2. release + lto
\time -f%e mbed compile -t GCC_ARM -m K64F --profile mbed-os/tools/profiles/release.json --profile mbed-os/tools/profiles/extensions/lto.json

# More details
mbed compile -t GCC_ARM -m K64F --profile mbed-os/tools/profiles/release.json --stats-depth=100
Documentation (Details of any document updates required)

Pull request type (required)

[] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[X] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results (required)

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers (optional)

@jamesbeyond @kjbracey-arm @maciejbocianski


Release Notes (required for feature/major PRs)

Summary of changes

Add a build profile extension with the link-time optimizer enabled for GCC_ARM toolchain.

Impact of changes
Migration actions required
  • The minimal required version of the GCC_ARM is now the GNU Arm Embedded Toolchain Version 9-2019-q4-major. Earlier GCC_ARMversions can cause various issues when the -flto flag is used, e.g. a platform specific error during the final link stage on Windows hosts with GCC8.

  • The noinline attribute has to be used for every function that must be placed into a specific section (specified with a section(".section_name") attribute). In general, when a function is considered for inlining, the section attribute is always ignored. However, with the link-time optimizer enabled, the chances for inlining are much higher because the inliner works across multiple translation units. As a result, the output sections' sizes change compared to a non-lto build. This may lead to a section ".section_name" will not fit in region "region_name" type errors.

  • The common flags defined for the GCC_ARM toolchain are now appended to the ld flags during a mbed-cli build. Previously, the common flags were appended only to asm, c and cxx flags. Having the same flags for the compiler and the linker is required when using the link-time optimizer (which is the case for the lto build profile extension). Any options unrecognized by the linker should be moved from common to asm, c or cxx.

@jamesbeyond
Copy link
Contributor

Do we want develop profile enabled LTO as well? @fkjagodzinski @bulislaw

@ciarmcom ciarmcom requested review from jamesbeyond, maciejbocianski and a team November 13, 2019 12:00
@ciarmcom
Copy link
Member

@fkjagodzinski, thank you for your changes.
@maciejbocianski @jamesbeyond @ARMmbed/mbed-os-maintainers @ARMmbed/mbed-os-tools please review.

@fkjagodzinski fkjagodzinski force-pushed the gcc_build-enable_lto_for_release branch from 5282cd4 to 3d07b88 Compare November 13, 2019 12:23
@fkjagodzinski
Copy link
Member Author

Force-pushed a fix for common flags and Travis failures.

Copy link
Contributor

@mark-edgeworth mark-edgeworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notes on this change suggest that the release profile has been changed, but the code changes also affect the debug and develop profiles too.
The move of the '-c' option from common to the asm, c and cxx tools appears to be an unrelated change, as is the corresponding change in gcc.py.

@fkjagodzinski
Copy link
Member Author

@mark-edgeworth, do you prefer the "common" flags duplicated in "ld" list in JSON as an alternative? These flags have to be supplied to the linker when -flto us used. Please see the -flto section of this man.

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time.

Or should I update the commit message to point to lto as the cause of this patch?

@mark-edgeworth
Copy link
Contributor

Thanks for the docs reference. Looks like making these flags common is the right way to go.

@fkjagodzinski fkjagodzinski force-pushed the gcc_build-enable_lto_for_release branch from 3d07b88 to 6d6b3e0 Compare November 13, 2019 16:31
@fkjagodzinski
Copy link
Member Author

Updated the last commit message to provide a full quote from GCC man.

@jamesbeyond
Copy link
Contributor

Hi, @fkjagodzinski @ARMmbed/mbed-os-maintainers @bulislaw ,
Do you think this would be a breaking change? if this went in mbed-os, that would means all people working in master need to update their toolchian to gcc 7 or later, other wise it will throw compilation errors.

bulislaw
bulislaw previously approved these changes Nov 15, 2019
Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a breaking change, but it can break some users. We should highlight it in release notes and maybe a blog post. Any strong feelings about develop? I think debuggability of lto builds is vastly improved in 8/9 @kjbracey-arm @pan-

@pan-
Copy link
Member

pan- commented Nov 15, 2019

Link-time optimizations do not require the presence of the whole program to operate. If the program does not require any symbols to be exported, it is possible to combine -flto and -fwhole-program to allow the interprocedural optimizers to use more aggressive assumptions which may lead to improved optimization opportunities. Use of -fwhole-program is not needed when linker plugin is active (see -fuse-linker-plugin).

@fkjagodzinski Do you know if linker plugin is active on gcc arm ? If not it could be useful to add it to the linker flags. Shouldn't you add -flto on linker flags ?

@kjbracey
Copy link
Contributor

I've no personal experience of working on LTO builds. The claim is debugging is vastly improved in 8/9.

But I'm generally wary of such claims - docs claim -Og is fine for debugging, but it often annoys me so I have to switch back to -O0. But we'll leave it off for debug anyway, for now.

I was about to make a point against LTO-for-develop on build time, but your tests show builds as faster with LTO on. Somewhat surprised - I guess the compilation is faster, but the link is slower. So non-clean builds will be slower. Could you check how much slower?

Main reason to have it on in develop is it means more likely to catch undefined behaviour traps exposed by removed function-barriers-to-optimisation in LTO. Reduces the urgency of separate release testing.

@kjbracey
Copy link
Contributor

@fkjagodzinski Do you know if linker plugin is active on gcc arm ? If not it could be useful to add it to the linker flags.

-fuse-linker-plugin should be on by default, but I think it's worth testing adding each of that and -fwhole-program (on the link options only) to check.

Shouldn't you add -flto on linker flags ?

-flto is on the linker flags, because the Python is changing to pass the common flags to the link stage. (But it's not actually necessary according to the docs - it happens automatically if any incoming object files were built with -flto).

Where the option would matter to the linker would be if you wanted to say -flto=auto or -flto=2 to make it do parallel work on the final link. Do we want to do that? How parallel is the existing compilation process? LTO pushes a lot of work to the final single link command, which means the build is significantly deparallelised without that.

@fkjagodzinski
Copy link
Member Author

Regarding the -fwhole-program, I already have the results in my notes here: https://gist.github.com/fkjagodzinski/0fa876aed0184584e2018c6ac1d6424b#mbed-cloud-client-example400

I'll get back to you when I check other flags. Thanks for the feedback!

@kjbracey
Copy link
Contributor

I just want to say something about how elegantly "LTO" kerns, at least in this font and browser. Gives me a warm fuzzy feeling every time I type it.

@bulislaw
Copy link
Member

OK, lets try to enable it for develop. I'd like us to run full CI on both develop and release (@jamesbeyond). I'm assuming we will go for -flto=auto?

@fkjagodzinski
Copy link
Member Author

fkjagodzinski commented Nov 19, 2019

A couple of updates:

  1. @fkjagodzinski Do you know if linker plugin is active on gcc arm ? If not it could be useful to add it to the linker flags.

    Yes, I checked our linker command with a verbose flag and could see -plugin <toolchain_dir>/gcc-arm-none-eabi-8-2019-q3-update/bin/../lib/gcc/arm-none-eabi/8.3.1/liblto_plugin.so -plugin-opt=<toolchain_dir>/gcc-arm-none-eabi-8-2019-q3-update/bin/../lib/gcc/arm-none-eabi/8.3.1/lto-wrapper among args.

    This holds true in both cases -- with -fuse-linker-plugin added or not. Running the linker command with the strace confirms gcc-arm-none-eabi-8-2019-q3-update/bin/../lib/gcc/arm-none-eabi/8.3.1/liblto_plugin.so is opened in both cases.

  2. I was about to make a point against LTO-for-develop on build time, but your tests show builds as faster with LTO on. Somewhat surprised - I guess the compilation is faster, but the link is slower. So non-clean builds will be slower. Could you check how much slower?

    Where the option would matter to the linker would be if you wanted to say -flto=auto or -flto=2 to make it do parallel work on the final link. Do we want to do that? How parallel is the existing compilation process?

    Here is a build time comparison for mbed-cloud-client-example@4.0.0:

    Mbed OS branch commit SHA build time rebuild time
    mbed-os-5.14.1 679d248 255.50 11.53
    gcc_build-enable_lto_for_release 15a5ee9 241.96 71.63
    -flto --> -flto=4 202367f 209.21 37.24
    -flto --> -flto=2 328f7ce 211.23 44.96

-flto=auto is not recognized on my machine. Also, I'm curious to check -flto=n on a Win machine since it appears to need the make installed.

If you specify the optional n, the optimization and code generation done at link time is executed in parallel using n parallel jobs by utilizing an installed make program. The environment variable MAKE may be used to override the program used. The default value for n is 1.

@fkjagodzinski
Copy link
Member Author

The rebuild time above was generated with the same build command after invoking touch main.cpp so the value represents mostly the link stage. The benefit of setting -flto=4 is quite significant on my machine.

@kjbracey
Copy link
Contributor

Mmm, that's a pretty hefty rebuild time increase.

(But I could probably live with it if I get assigned a desktop workstation. All the cool kids around here seem to have one these days.)

It may be that -flto=auto is GCC 9 - I was looking at current docs.

But it seems any form of parallel link could be adding new dependency complications :(

@fkjagodzinski
Copy link
Member Author

Also, I'm curious to check -flto=n on a Win machine since it appears to need the make installed.

Not much progress in testing LTO performance on Windows so far. Got a weird error during the link stage:

lto-wrapper.exe: fatal error: CreateProcess: No such file or directory
compilation terminated.
c:/program files (x86)/gnu tools arm embedded/8 2019-q3-update/bin/../lib/gcc/arm-none-eabi/8.3.1/../../../../arm-none-eabi/bin/ld.exe: error: lto-wrapper failed
collect2.exe: error: ld returned 1 exit status

I'm trying to resolve that.

@fkjagodzinski
Copy link
Member Author

@0xc0170 sorry for the late update! I guess the CI will have to be restarted after another review round.

@mbed-ci
Copy link

mbed-ci commented Feb 13, 2020

Test run: FAILED

Summary: 1 of 4 test jobs failed
Build number : 13
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-IAR

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 14, 2020

CI restarted

@mbed-ci
Copy link

mbed-ci commented Feb 14, 2020

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 14
Build artifacts

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 14, 2020

@ARMmbed/mbed-os-tools Can you please review again? If this is approved today, can be in alpha2

@jamesbeyond
Copy link
Contributor

ping @mark-edgeworth, what do you think the changes this time?

Copy link
Contributor

@mark-edgeworth mark-edgeworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a great improvement over the cache-file approach. Looks fine to me now.

@jamesbeyond
Copy link
Contributor

I think this one is good to go @0xc0170 @adbridge

@mark-edgeworth
Copy link
Contributor

It looks like this issue: ARMmbed/mbed-cli#942 may be related to this fix. @jamesbeyond can you comment if so?

@jamesbeyond
Copy link
Contributor

It looks like this issue: ARMmbed/mbed-cli#942 may be related to this fix. @jamesbeyond can you comment if so?

I hope Filip's change about re-ordering the symbols could fix this issue 😃

@fkjagodzinski
Copy link
Member Author

It looks like this issue: ARMmbed/mbed-cli#942 may be related to this fix. @jamesbeyond can you comment if so?

I hope Filip's change about re-ordering the symbols could fix this issue smiley

Sorry but that workaround is applied only if -flto flag is present among linker flags. Moreover, only the object files generated from assembly are reordered and the linked issue mentions two .c source files. Nevertheless, it might be worth trying the object file reordering.

@0xc0170 0xc0170 removed the release-version: 6.0.0-alpha-2 Second pre-release version of 6.0.0 label Feb 18, 2020
@0xc0170
Copy link
Contributor

0xc0170 commented Feb 18, 2020

We are release alpha2, this will be merged tomorrow morning once we are good to go with the release

@0xc0170 0xc0170 merged commit d9becd4 into ARMmbed:master Feb 19, 2020
@mergify mergify bot removed the ready for merge label Feb 19, 2020
@fkjagodzinski fkjagodzinski deleted the gcc_build-enable_lto_for_release branch February 19, 2020 13:19
@JojoS62
Copy link
Contributor

JojoS62 commented Feb 27, 2020

I've tried the lto extension, it produces 222 messages like:

Unknown object name found in GCC map file: C:\Users\sn\AppData\Local\Temp\blinky.elf.ELsjFs.ltrans0.ltrans.o

In some other issue I found that this is a 'cosmetic' error, but having >200 messages is annoying. Will updated tools suppress this message?

The complete line in the map file is:

c:/program files (x86)/gnu tools arm embedded/9 2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7-m/nofp\libc_nano.a(lib_a-nano-vfprintf.o)
                              C:\Users\sn\AppData\Local\Temp\blinky.elf.ELsjFs.ltrans0.ltrans.o (vfprintf)

Another problem I found is that lto and minimal-printf linker extensions do not work together.

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 27, 2020

@JojoS62 Thanks for the feedback. Can you create an issue report with details? We can then triage this and review

@JojoS62
Copy link
Contributor

JojoS62 commented Feb 27, 2020

Yes, I will prepare it.

JanneKiiskila pushed a commit to JanneKiiskila/pelion-client-lite-example that referenced this pull request Apr 9, 2020
When switching over to Mbed OS 6.0 Alpha 3 the most tight optimized
version of Client Lite stopped compiling. A mysterious error
popped up complaining "-c" option is not supported. After some git
bisecting the offending commit was found a issue was back-traced
to some workarounds done to avoid some LTO issues with GCC.

ARMmbed/mbed-os#11856

Issue is that the gcc.py changes done now expand the parameters
to linked from the ld-block and common-block. THe linker does not
recognize the "-c" option expanded there from the common-block.

However, we can work around this issue the same way the original
Mbed OS PR did - split the -c out to asm, c and cxx blocks.
This works with both Mbed OS 5.15.x and Mbed OS 6.0.0 Alpha 3.

More details in Mbed OS issue
[#12871](ARMmbed/mbed-os#12781)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.