-
-
Notifications
You must be signed in to change notification settings - Fork 11
Add 'tools' section to the optimization article. #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5b596cd to
878cdca
Compare
engine/guidelines/optimization.rst
Outdated
| For micro-optimizations, C++ compilers will often be aware of basic tricks and | ||
| will already perform them in optimized builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note could use a few examples, maybe. Constant unfolding is later mentioned, so it may not be necessary, after all. Would be nice if someone chimed in. I suppose things like Common subexpression elimination count as part of this?
Either way, perhaps something along these lines to hammer the point home?
| For micro-optimizations, C++ compilers will often be aware of basic tricks and | |
| will already perform them in optimized builds. | |
| C++ compilers already perform many basic optimization tricks when requested. Therefore, without proper care, micro-optimizing code can often reduce readability while providing no practical benefit in optimized builds. |
Feel free to rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, good point, the info box needs a conclusion. I've added a sentence about not all changes being good.
engine/guidelines/optimization.rst
Outdated
| looks to test the code faithfully, but the benchmarks show no improvement. This might be | ||
| indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| looks to test the code faithfully, but the benchmarks show no improvement. This might be | |
| indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using | |
| aims to test the code faithfully, but the benchmarks show no improvement. This might be | |
| indicative of a poorly written benchmark, which the compiler is able to "optimize away" by performing |
Also, alternatively, but maybe it makes the sentence more complex:
| looks to test the code faithfully, but the benchmarks show no improvement. This might be | |
| indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using | |
| aims to test the code faithfully, but the benchmarks show no improvement. This might suggest the benchmarks are not written to account for optimizations performed by the compiler, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wanted to say isn't that one tries to test the code faithfully, but that the benchmark looks like it should test the code faithfully. I'm not sure what the difference is, but it feels more accurate in my mind. I changed the phrasing to make this clearer, but feel free to challenge my changes if you think it's weird 😄
engine/guidelines/optimization.rst
Outdated
| Assembly viewers show the final compiled version of your code in readable | ||
| assembly format. This can be an effective way to optimize low-level code. It is not effective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assembly viewers, plural, so it's "They're", I think.
| Assembly viewers show the final compiled version of your code in readable | |
| assembly format. This can be an effective way to optimize low-level code. It is not effective | |
| Assembly viewers show the final compiled version of your code in a readable | |
| format. This information can greatly help optimizing low-level code. They're not effective |
Alternatively, but I personally avoid "but"s. I don't know if it's even necessary to specify "assembly" again.
| Assembly viewers show the final compiled version of your code in readable | |
| assembly format. This can be an effective way to optimize low-level code. It is not effective | |
| Assembly viewers show the final compiled version of your code in a readable, but still assembly | |
| format. This information can greatly help optimizing low-level code. They're not effective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to comment that "examining assembly" can be an effective way to optimize the code (not necessarily the viewers). I changed the wording accordingly.
engine/guidelines/optimization.rst
Outdated
| `software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable | ||
| tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more than one ("resources"), so it's plural "tools".
| `software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable | |
| tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. | |
| `software optimization resources <https://www.agner.org/optimize/>`__ have released invaluable | |
| tools for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. |
But I'm not sure that fits quite right, either?
| `software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable | |
| tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. | |
| `software optimization resources <https://www.agner.org/optimize/>`__ are invaluable | |
| for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"emerged" is often used in scientific context in this way. It means that something that somebody made has slowly gained attention and is, by now, a useful tool known in the field. Something that somebody just made is never useful because its usefulness has to be proven first, by people using it regularly over time. Hence, they say "tools emerged as useful" instead of "somebody created useful tools".
That being said, if you are not familiar with this phrasing, it's probably best to change it. I'll go ahead with your second suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your explanation the term does make sense. If you want to make it clearer that's what happened, you could make it more explicit too.
| `software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable | |
| tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. | |
| `software optimization resources <https://www.agner.org/optimize/>`__ have gained popularity as an invaluable | |
| tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__. |
or similar! But either way, it's a nitpick.
878cdca to
31bdaa1
Compare
31bdaa1 to
a357a9f
Compare
|
Before going in more detail, I do think we should consider what we hope to achieve: Is the aim in the contributor documentation:
In the first draft I tried to mostly cover (1), with the idea that we have tried to introduce the beginner skills for optimization in the main documentation, as they are relevant not just to people making PRs, but to users optimizing their own GDScript, C# and c++ modules and extensions. So in my mind there is a reasonable argument to shifting as much information as possible to the main documentation. We should also consider there are also extensive external resources on teaching optimization / references - do we intend to become a go to guide for learning optimization on the internet, or should we limit things as to what is relevant to the engine (i.e. is there is a risk of going out of scope here and maintaining far more documentation than we need 🤔 )? Also at the risk of bringing up controversial philosophy / gatekeeping, should we be encouraging newbies to submit optimization PRs until they have some skills / experience? Core maintainers already have to spend significant time reviewing low quality PRs, and the submitters can end up being disappointed / disillusioned when they realise the subject isn't as simple as they thought. |
Definitely just 1., not 2. I think we'd all agree to that.
I'm not sure if the information added here is relevant to users outside of those looking to optimizing the engine.
What we've provided here so far is very far from being a "go-to" guide. I'd love for our article to be a link hub to good resources instead, if we know any. The only ones I've used extensively so far are Agner Fog's and Compiler Explorer (both of which I've linked).
Hrm, yea, I can see us discouraging optimization PRs altogether for new contributors. But someone getting started on a performance PR can still save us some time, even if it doesn't turn out mergable on first try. Finding a good formulation for that would be nice. |
engine/guidelines/optimization.rst
Outdated
| For micro-optimizations, C++ compilers will often be aware of basic tricks and | ||
| will already perform them in optimized builds. Therefore, not all changes that | ||
| look like they should optimize the code will actually make the code faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why viewing assembly can be important (particularly when benchmarks aren't super convincing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, good point. I'll try to weave this info in somehow.
engine/guidelines/optimization.rst
Outdated
| Profilers are the most important tool for everyone optimizing code. | ||
| They show you which parts of the code are responsible for slow execution or heavy CPU load, | ||
| and are therefore excellent for identifying what needs to be optimized. Profilers can | ||
| also be used to identify whether the problem has been resolved, by profiling again after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"identify whether performance has been increased"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up shortening this as well. Probably needs another round of feedback.
engine/guidelines/optimization.rst
Outdated
| Assembly viewers | ||
| ^^^^^^^^^^^^^^^^ | ||
|
|
||
| Assembly viewers show the final compiled version of your code in a readable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assembly is a human readable form of the final compiled machine code, specific to the architecture being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's what I wanted to say. Any idea how to improve the wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like:
Assembly
^^^^^^^^^
In some cases when making low level optimizations, it can be a good idea to view the assembly language generated by the compiler. Assembly language is a human readable form of machine code, which the final binary instructions used to drive the CPU. As such the assembly language will depend on the architecture, the compiler and the compiler settings.
Viewing assembly allows you to compare the before / after result, and to confirm hypotheses used to guide optimization, and to see what the compiler is doing in general. It can also be helpful to reviewers in relevant pull requests, so please post assembly with your PR where appropriate.
There are many ways to view assembly, including:
- godbolt
- setting clang / gcc / cl to output assembly when compiling the file of interest
- inside your IDE debugger when running Godot
- inside your profiling tool (if supported, e.g. Callgrind)
You may find the following resources useful:
...
I'll see if I can dig out some example for compiling a single file to assembly, it's a little bit fiddly...
clang++ -S -o clang_asm.s -c -std=gnu++14 -O3 -pipe -fpie -fno-exceptions -DNDEBUG -DNO_EDITOR_SPLASH -DTOUCH_ENABLED -DALSA_ENABLED -DALSAMIDI_ENABLED -DPULSEAUDIO_ENABLED -D_REENTRANT -DSPEECHD_ENABLED -DJOYDEV_ENABLED -DUDEV_ENABLED -DX11_ENABLED -DUNIX_ENABLED -DOPENGL_ENABLED -DGLES_ENABLED -D_FILE_OFFSET_BITS=64 -DPTRCALL_ENABLED -DMINIZIP_ENABLED -DZSTD_STATIC_LINKING_ONLY -DGLAD_ENABLED -DGLES_OVER_GL -DBT_USE_OLD_DAMPING_METHOD -DBT_THREADSAFE -Ithirdparty/bullet -Ithirdparty/libpng -Ithirdparty/glad -Ithirdparty/zstd -Ithirdparty/zlib -Iplatform/x11 -I. -I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include modules/bullet/area_bullet.cpp
I think from memory it involved seeing the verbose command line for compiling the file, then adding the necessary flags to output assembly (-S maybe?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds nice, but going back to your own original point — should we be going into that much detail? Most contributors won't (and imo, shouldn't) be looking at assembly, so maybe we should leave detailed explanations to other articles. My main idea with mentioning it at all is that we can link resources that do go into detail, and to state our opinion about it (i.e. "can be an effective way to optimize low-level code").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could leave out the "many ways to view assembly" bit I guess.
I'm kind of the opposite 😀 , I think we should be encouraging anyone making low level PRs to investigate and learn some basic assembly (but then I wrote in assembly about a decade before learning c++). Admittedly I know very little x86, but enough to get a gist of what it is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in low level PRs, I use my knowledge of assembly only to shortcut investigations, but end up optimizing the PR solely based on measurement results. That's why I think this knowledge is useful, but optional. (I suppose this is more relevant though to the point below, where we might ask people to provide their assembly investigations in the PR).
As for this discussion, I used your text as the general framework, but very harshly cut it to be minimal. Let me know if it works for you!
| - Show that you improved the code either by profiling again, or running systematic benchmarks. | ||
| See `tools <#tools-for-optimization>`__ for more info. | ||
| - Test on multiple platforms where appropriate, especially mobile. | ||
| - When micro-optimizing, show assembly before / after where appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't remove this, this is part of the reason for the whole document.
We were getting people submitting PRs without assembly in cases where it was needed to show what was happening, to show that their PR worked as described. The "where appropriate" is vague enough that it shows not needed in a lot of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this point because I don't think it's relevant for most PRs, even most performance optimization PRs. Looking at assembly can be helpful, but it's far less important than actually measuring performance.
1e19224 to
2fa83a5
Compare
2fa83a5 to
5b622d5
Compare
d8a0e26 to
0e2f6e3
Compare
0e2f6e3 to
cc7d357
Compare
Improvements added:
I've tried to be brief, though I felt like some examples were needed to really get the points across.