Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[build] Enable link-time optimization for iOS release builds #12502

Merged
merged 2 commits into from
Jul 31, 2018
Merged

Conversation

jfirebaugh
Copy link
Contributor

This results in a substantial size savings.

I haven't been able to get LTO working for Android builds due to android/ndk#721. I'll open a followup ticket for that.

@jfirebaugh jfirebaugh requested review from kkaefer and 1ec5 July 27, 2018 22:24
@friedbunny friedbunny mentioned this pull request Jul 27, 2018
5 tasks
@friedbunny friedbunny added iOS Mapbox Maps SDK for iOS build labels Jul 27, 2018
@friedbunny friedbunny added this to the ios-v4.3.0 milestone Jul 27, 2018
@friedbunny
Copy link
Contributor

@jfirebaugh Nice. A few questions:

  • Have you investigated ThinLTO?
  • LTO is usually couched as a way to enhance runtime performance — any news on that front?
  • Is there a meaningful change/increase in build time?

@friedbunny friedbunny added the performance Speed, stability, CPU usage, memory usage, or power usage label Jul 27, 2018
@jfirebaugh
Copy link
Contributor Author

Have you investigated ThinLTO?

ThinLTO compiles successfully, but reduces the size decrease between 0.5 and 2 percentage points, so let's stick with "monolithic" LTO.

LTO is usually couched as a way to enhance runtime performance — any news on that front?

Yes, it seems to be faster across the board:

Before:
-------------------------------------------------------------------------------------
Benchmark                                              Time           CPU Iterations
-------------------------------------------------------------------------------------
API_queryRenderedFeaturesAll                   410784008 ns  395907500 ns          2
API_queryRenderedFeaturesLayerFromLowDensity      703810 ns     661799 ns       1067
API_queryRenderedFeaturesLayerFromHighDensity    9273363 ns    9044394 ns         71
API_renderStill_reuse_map                       28127765 ns   22553250 ns         28
API_renderStill_reuse_map_switch_styles        476081575 ns  435200500 ns          2
API_renderStill_recreate_map                   404103467 ns  358979000 ns          2
Parse_CameraFunction/1                              7281 ns       7109 ns      83339 1
Parse_CameraFunction/2                              8254 ns       7909 ns      79818 2
Parse_CameraFunction/4                             10327 ns      10029 ns      74923 4
Parse_CameraFunction/6                             11699 ns      11419 ns      61060 6
Parse_CameraFunction/8                             13193 ns      12804 ns      58977 8
Parse_CameraFunction/10                            14857 ns      14381 ns      48148 10
Parse_CameraFunction/12                            15339 ns      14976 ns      46044 12
Evaluate_CameraFunction/1                             59 ns         57 ns   11182109 1
Evaluate_CameraFunction/2                            110 ns        107 ns    6215261 2
Evaluate_CameraFunction/4                            141 ns        133 ns    5079973 4
Evaluate_CameraFunction/6                            145 ns        141 ns    4997965 6
Evaluate_CameraFunction/8                            155 ns        151 ns    5030398 8
Evaluate_CameraFunction/10                           151 ns        147 ns    4128332 10
Evaluate_CameraFunction/12                           155 ns        152 ns    4595528 12
Parse_CompositeFunction/1                          11036 ns      10738 ns      60854 1
Parse_CompositeFunction/2                          17687 ns      17264 ns      41007 2
Parse_CompositeFunction/4                          40081 ns      37589 ns      18125 4
Parse_CompositeFunction/6                          68860 ns      67645 ns      10486 6
Parse_CompositeFunction/8                         109374 ns     106975 ns       6646 8
Parse_CompositeFunction/10                        165186 ns     161602 ns       4521 10
Parse_CompositeFunction/12                        225316 ns     221312 ns       3258 12
Evaluate_CompositeFunction/1                         494 ns        480 ns    1188334 1
Evaluate_CompositeFunction/2                         674 ns        666 ns     986179 2
Evaluate_CompositeFunction/4                         790 ns        778 ns     883738 4
Evaluate_CompositeFunction/6                         877 ns        864 ns     765889 6
Evaluate_CompositeFunction/8                        1021 ns        997 ns     788546 8
Evaluate_CompositeFunction/10                       1010 ns        951 ns     731208 10
Evaluate_CompositeFunction/12                        959 ns        913 ns     670190 12
Parse_SourceFunction/1                              8761 ns       8651 ns      78607 1
Parse_SourceFunction/2                              9368 ns       9270 ns      68159 2
Parse_SourceFunction/4                             12393 ns      11784 ns      64948 4
Parse_SourceFunction/6                             12506 ns      12228 ns      47992 6
Parse_SourceFunction/8                             14469 ns      13912 ns      50672 8
Parse_SourceFunction/10                            16322 ns      16183 ns      42170 10
Parse_SourceFunction/12                            16979 ns      16379 ns      38726 12
Evaluate_SourceFunction/1                            567 ns        547 ns    1446607 1
Evaluate_SourceFunction/2                            561 ns        550 ns    1133879 2
Evaluate_SourceFunction/4                            601 ns        571 ns     998887 4
Evaluate_SourceFunction/6                            592 ns        560 ns    1076079 6
Evaluate_SourceFunction/8                            569 ns        545 ns    1086568 8
Evaluate_SourceFunction/10                           656 ns        621 ns    1098970 10
Evaluate_SourceFunction/12                           577 ns        562 ns     999001 12
Parse_Filter                                        6589 ns       6541 ns      98239
Parse_EvaluateFilter                                 149 ns        143 ns    4068490
TileMaskGeneration                                  2405 ns       2339 ns     309565
Parse_VectorTile                                 2490494 ns    2431625 ns        275
Util_dtoa                                           2863 ns       2809 ns     253329
Util_standardDtoa                                   3640 ns       3468 ns     197649
Util_dtoaLimits                                      428 ns        419 ns    1646698
Util_standardDtoaLimits                            46271 ns      44309 ns      15746
TileCountBounds                                      115 ns        110 ns    5786846
TileCountPolygon                                   27677 ns      27343 ns      23766
TileCoverPitchedViewport                            4184 ns       4003 ns     159358
TileCoverBounds                                     1630 ns       1538 ns     396430
TileCoverPolygon                                   13030 ns      11942 ns      60945

After:

-------------------------------------------------------------------------------------
Benchmark                                              Time           CPU Iterations
-------------------------------------------------------------------------------------
API_queryRenderedFeaturesAll                   283767778 ns  281489000 ns          2
API_queryRenderedFeaturesLayerFromLowDensity      513801 ns     509064 ns       1321
API_queryRenderedFeaturesLayerFromHighDensity    7471770 ns    7388589 ns         95
API_renderStill_reuse_map                       17220142 ns   12306667 ns         51
API_renderStill_reuse_map_switch_styles        179160388 ns  139416600 ns          5
API_renderStill_recreate_map                   268271023 ns  212350000 ns          3
Parse_CameraFunction/1                              5248 ns       5228 ns     128422 1
Parse_CameraFunction/2                              5858 ns       5833 ns     117369 2
Parse_CameraFunction/4                              7067 ns       7026 ns      98041 4
Parse_CameraFunction/6                              8289 ns       8246 ns      84437 6
Parse_CameraFunction/8                              9455 ns       9394 ns      73275 8
Parse_CameraFunction/10                            10550 ns      10477 ns      66806 10
Parse_CameraFunction/12                            11629 ns      11560 ns      59522 12
Evaluate_CameraFunction/1                             40 ns         40 ns   17889271 1
Evaluate_CameraFunction/2                             80 ns         79 ns    8822232 2
Evaluate_CameraFunction/4                            101 ns        100 ns    6976420 4
Evaluate_CameraFunction/6                            111 ns        110 ns    6355144 6
Evaluate_CameraFunction/8                            114 ns        113 ns    6183800 8
Evaluate_CameraFunction/10                           119 ns        118 ns    5939300 10
Evaluate_CameraFunction/12                           125 ns        124 ns    5815064 12
Parse_CompositeFunction/1                           8225 ns       8172 ns      85292 1
Parse_CompositeFunction/2                          12935 ns      12861 ns      54205 2
Parse_CompositeFunction/4                          28394 ns      28202 ns      24925 4
Parse_CompositeFunction/6                          51084 ns      50652 ns      13644 6
Parse_CompositeFunction/8                          80956 ns      80457 ns       8592 8
Parse_CompositeFunction/10                        117054 ns     116427 ns       5958 10
Parse_CompositeFunction/12                        162608 ns     161598 ns       4313 12
Evaluate_CompositeFunction/1                         370 ns        368 ns    1857666 1
Evaluate_CompositeFunction/2                         513 ns        510 ns    1347839 2
Evaluate_CompositeFunction/4                         601 ns        598 ns    1165870 4
Evaluate_CompositeFunction/6                         655 ns        651 ns    1079930 6
Evaluate_CompositeFunction/8                         672 ns        667 ns    1004275 8
Evaluate_CompositeFunction/10                        687 ns        683 ns    1008762 10
Evaluate_CompositeFunction/12                        699 ns        695 ns     997449 12
Parse_SourceFunction/1                              6568 ns       6527 ns     105902 1
Parse_SourceFunction/2                              7094 ns       7059 ns      96510 2
Parse_SourceFunction/4                              8328 ns       8274 ns      83854 4
Parse_SourceFunction/6                              9457 ns       9398 ns      74071 6
Parse_SourceFunction/8                             10544 ns      10481 ns      64357 8
Parse_SourceFunction/10                            11675 ns      11599 ns      58417 10
Parse_SourceFunction/12                            12634 ns      12563 ns      54119 12
Evaluate_SourceFunction/1                            356 ns        354 ns    2011483 1
Evaluate_SourceFunction/2                            388 ns        386 ns    1792032 2
Evaluate_SourceFunction/4                            426 ns        423 ns    1707850 4
Evaluate_SourceFunction/6                            421 ns        419 ns    1659035 6
Evaluate_SourceFunction/8                            425 ns        422 ns    1651442 8
Evaluate_SourceFunction/10                           431 ns        428 ns    1639536 10
Evaluate_SourceFunction/12                           454 ns        451 ns    1331938 12
Parse_Filter                                        4986 ns       4952 ns     132974
Parse_EvaluateFilter                                 113 ns        112 ns    6118132
TileMaskGeneration                                  1736 ns       1725 ns     405468
Parse_VectorTile                                 1919556 ns    1902925 ns        362
Util_dtoa                                           2092 ns       2079 ns     335852
Util_standardDtoa                                   2708 ns       2689 ns     261733
Util_dtoaLimits                                      316 ns        315 ns    2202907
Util_standardDtoaLimits                            35879 ns      35671 ns      19514
TileCountBounds                                       83 ns         82 ns    8397011
TileCountPolygon                                   22169 ns      22061 ns      31370
TileCoverPitchedViewport                            3263 ns       3241 ns     214177
TileCoverBounds                                     1277 ns       1270 ns     537622
TileCoverPolygon                                    9035 ns       8982 ns      77551

Is there a meaningful change/increase in build time?

Yes, the link step takes noticeably longer. However, since we're enabling this only for release builds, this will not affect local development in the common case.

@@ -3781,6 +3781,7 @@
INFOPLIST_FILE = framework/Info.plist;
INSTALL_PATH = "$(LOCAL_LIBRARY_DIR)/Frameworks";
LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/Frameworks @loader_path/Frameworks";
LLVM_LTO = YES_THIN;
Copy link
Contributor

@friedbunny friedbunny Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables LTO for the dynamic target — we’ll also want to do this for the static target.

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -flto=thin")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -flto=thin")
set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} -flto=thin")
set(CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO} -flto=thin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consult @kkaefer, but I think using our set_xcode_property macro to set Xcode’s LLVM_LTO flag will be easier to read in the cmake config and clearer when looking at the settings in Xcode itself.

set_xcode_property(${target} LLVM_LTO $<$<CONFIG:Release>:YES>)
set_xcode_property(${target} LLVM_LTO $<$<CONFIG:RelWithDebugInfo>:YES>)

Or we could combine the two build modes into a single line using a more complex/ugly generator expression:

set_xcode_property(${target} LLVM_LTO $<$<OR:$<CONFIG:Release>,$<CONFIG:RelWithDebugInfo>>:YES>)

This should have the same effect as -flto and has the benefit of being reflected in the Build Settings UI:

screen shot 2018-07-30 at 8 15 17 pm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for most targets, but not for mbgl-node because it's an "INTERFACE_LIBRARY" target:

https://circleci.com/gh/mapbox/mapbox-gl-native/136552

Should we:

  • Not bother with LTO for macOS node bindings
  • Use set_xcode_property in initialize_xcode_cxx_build_settings instead
  • Loop with foreach(ABI IN LISTS mbgl-node::abis) in macos/config.cmake and set_xcode_property for each (non-INTERFACE_LIBRARY) target

?

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -flto=thin")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -flto=thin")
set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} -flto=thin")
set(CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO} -flto=thin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that this config doesn’t affect the mbgl targets in the macos project, so those currently don’t have LTO set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ideally we'd set the Xcode configuration settings. However, technically you could also run cmake .. -GNinja on macOS to build on macOS without Xcode.

Copy link
Contributor

@friedbunny friedbunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
build iOS Mapbox Maps SDK for iOS performance Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants