Skip to content

Commit 5b596cd

Browse files
committed
Add 'high level / low level' and 'tools' sections to the optimization article.
1 parent 39b1cd1 commit 5b596cd

File tree

1 file changed

+96
-11
lines changed

1 file changed

+96
-11
lines changed

engine/guidelines/optimization.rst

Lines changed: 96 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,12 @@ Choosing what to optimize
2020
-------------------------
2121

2222
Predicting which code would benefit from optimization can be difficult without
23-
using performance analysis tools.
24-
23+
using performance analysis `tools <#tools-for-optimization>`_:
2524
Oftentimes code that looks slow has no impact on overall performance, and code
2625
that looks like it should be fast has a huge impact on performance. Further,
2726
reasoning about why a certain chunk of code is slow is often impossible to do
2827
without detailed metrics (e.g. from a profiler).
2928

30-
Instructions on using some common profilers with Godot can be found `here
31-
<https://docs.godotengine.org/en/stable/engine_details/development/debugging/using_cpp_profilers.html>`_.
32-
3329
As an example, you may optimize a chunk of code by caching intermediate values.
3430
However, if that code was slow due to memory constraints, caching the values and
3531
reading them later may be even slower than calculating them from scratch!
@@ -96,6 +92,87 @@ Once you have your baseline profile/benchmark, make your changes and rebuild the
9692
engine with the exact same build settings you used before. Then profile again
9793
and compare the results.
9894

95+
High level vs low level optimization
96+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
97+
98+
Optimizing code is different between 'high level' and 'low level' code.
99+
100+
'High level' code refers to code that heavily relies on frameworks and functions to
101+
perform its task. This is most of Godot's code. In high level code, it is often most
102+
important to avoid doing expensive work entirely, e.g. by caching values rather than
103+
making duplicate calls, by avoiding copying data unnecessarily, or by replacing calls
104+
to expensive functions with calls to cheap functions.
105+
106+
In contrast, 'low level' code refers to code that is working mostly with C++ language
107+
features, such as primitive types and ``for``-loops. Optimizing low level code, often
108+
referred to as "micro optimization", can be more difficult, because it requires intimate
109+
knowledge about C++ compiler intrinsics, and the inner workings of the CPU and RAM.
110+
What's more, improving low level code is often unintuitive, and can reduce readability
111+
or robustness of the code. We recommend against attempting to optimize low level code,
112+
unless you are a very experienced low level C++ programmer.
113+
114+
.. note:::
115+
For micro-optimizations, C++ compilers will often be aware of basic tricks and
116+
will already perform them in optimized builds.
117+
118+
Tools for optimization
119+
~~~~~~~~~~~~~~~~~~~~~~
120+
121+
Profilers
122+
^^^^^^^^^
123+
124+
Profilers are the most important tool for everyone optimizing code.
125+
They show you which parts of the code are responsible for slow execution or heavy CPU load,
126+
and are therefore perfect for identifying when and which code to optimize. Profilers can
127+
also be used to identify whether the problem has been resolved, by profiling again after
128+
making the changes. Godot has a built-in profiler, but it does not provide very detailed
129+
information. Instead, use dedicated C++ profilers, which are
130+
`explained in the Godot documentation <https://docs.godotengine.org/en/stable/engine_details/development/debugging/using_cpp_profilers.html>`__.
131+
132+
Benchmarks
133+
^^^^^^^^^^
134+
135+
Benchmarks can be a great and simple tool to test the impact of your changes
136+
of an isolated piece of code. However, benchmarks can be deceptive: It can be easy to
137+
accidentally write a benchmark that highlights a way in which you improved performance, while
138+
ignoring other ways in which you made it worse.
139+
140+
To give one example: The most expensive operation of in modern CPU programming is fetching RAM
141+
that is not in cache. Benchmarks often test code with values that are already in cache
142+
('hot' execution), but often, it is more important to optimize for the case where values are not
143+
in cache yet ('cold' execution).
144+
145+
Another common source of confusion is compiler optimization: One might write a benchmark that
146+
looks to test the code faithfully, but the benchmarks show no improvement. This might be
147+
indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using
148+
`constant folding <https://en.wikipedia.org/wiki/Constant_folding>`__.
149+
For these, and other reasons, it is difficult to write good benchmarks. When using benchmarks to
150+
test the performance of your code, always be aware of its potential caveats, and try to familiarize
151+
yourself with good benchmark practices.
152+
153+
To start writing benchmarks in Godot, use the following GDScript code template:
154+
155+
.. code-block:: gdscript
156+
157+
var start = Time.get_ticks_msec()
158+
var s := "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
159+
for i in range(10000):
160+
s.replace("e", "b") # Benchmarks the 'replace' function.
161+
print(Time.get_ticks_msec() - start, "ms")
162+
163+
Alternatively, you can benchmark right from C++:
164+
165+
.. code-block:: cpp
166+
167+
String s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
168+
169+
auto t0 = std::chrono::high_resolution_clock::now();
170+
for (int i = 0; i < 100000; i ++) {
171+
String s1 = s.replace("e", "b"); // Benchmarks the 'replace' function.
172+
}
173+
auto t1 = std::chrono::high_resolution_clock::now();
174+
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
175+
99176
.. note::
100177

101178
Results will fluctuate, so you'll need to make your test project or
@@ -104,20 +181,28 @@ and compare the results.
104181
test multiple times, and observe how much the results fluctuate. Fluctuations of up
105182
to 10% are common and expected. The fastest run is usually the most accurate number.
106183

184+
Assembly Viewers
185+
^^^^^^^^^^^^^^^^
186+
187+
Assembly viewers show the final compiled version of your code in readable
188+
assembly format. This can be an effective way to optimize low-level code. It is not effective
189+
for optimization of high-level code, and should often be the 'last resort' tool, when it is clear
190+
that other optimization methods are not possible. Effectively working with assembly to optimize
191+
code requires an intimate understanding of the cost of individual instructions. Agner Fog's
192+
`C++ optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable
193+
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
194+
To view assembly, you either use an assembly viewer program for desktop, or write dedicated
195+
functions in the popular multi-architecture tool `Compiler Explorer <https://godbolt.org>`__.
196+
107197
Pull request requirements
108198
-------------------------
109199

110200
When making an optimization PR you should:
111201

112202
- Explain why you chose to optimize this code (e.g. include the profiling result, link the issue report, etc.).
113203
- Show that you improved the code either by profiling again, or running systematic benchmarks.
204+
See `tools <#tools-for-optimization>`__ for more info.
114205
- Test on multiple platforms where appropriate, especially mobile.
115-
- When micro-optimizing, show assembly before / after where appropriate.
116-
117-
In particular, you should be aware that for micro-optimizations, C++ compilers will often
118-
be aware of basic tricks and will already perform them in optimized builds. This is why
119-
showing before / after assembly can be important in these cases.
120-
(`godbolt <https://godbolt.org/>`_ can be particularly useful for this purpose.)
121206

122207
The most important point to get across in your PR is to highlight the source of
123208
the performance issues, and have a clear explanation for how your PR fixes that

0 commit comments

Comments
 (0)