Skip to content

Conversation

@Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Oct 12, 2025

Improvements added:

  • I've added a 'tools' section. This gives us the opportunity to go more in depth about the tools at your disposal, and to link relevant resources. I've linked some (agner fog, compiler explorer, and our C++ profiling article), but more could be added.
  • I removed some detail from other sections, because I think the information fits better in the new sections.

I've tried to be brief, though I felt like some examples were needed to really get the points across.

@Ivorforce Ivorforce added the enhancement New feature or request label Oct 12, 2025
@Ivorforce Ivorforce force-pushed the high-low-level-tools branch 2 times, most recently from 5b596cd to 878cdca Compare October 12, 2025 10:56
@Ivorforce Ivorforce requested a review from a team October 12, 2025 11:25
Comment on lines 115 to 116
For micro-optimizations, C++ compilers will often be aware of basic tricks and
will already perform them in optimized builds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note could use a few examples, maybe. Constant unfolding is later mentioned, so it may not be necessary, after all. Would be nice if someone chimed in. I suppose things like Common subexpression elimination count as part of this?
Either way, perhaps something along these lines to hammer the point home?

Suggested change
For micro-optimizations, C++ compilers will often be aware of basic tricks and
will already perform them in optimized builds.
C++ compilers already perform many basic optimization tricks when requested. Therefore, without proper care, micro-optimizing code can often reduce readability while providing no practical benefit in optimized builds.

Feel free to rephrase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, good point, the info box needs a conclusion. I've added a sentence about not all changes being good.

Comment on lines 146 to 147
looks to test the code faithfully, but the benchmarks show no improvement. This might be
indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
looks to test the code faithfully, but the benchmarks show no improvement. This might be
indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using
aims to test the code faithfully, but the benchmarks show no improvement. This might be
indicative of a poorly written benchmark, which the compiler is able to "optimize away" by performing

Also, alternatively, but maybe it makes the sentence more complex:

Suggested change
looks to test the code faithfully, but the benchmarks show no improvement. This might be
indicative of a poorly written benchmark, which the compiler is able to 'optimize away' by using
aims to test the code faithfully, but the benchmarks show no improvement. This might suggest the benchmarks are not written to account for optimizations performed by the compiler, such as

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to say isn't that one tries to test the code faithfully, but that the benchmark looks like it should test the code faithfully. I'm not sure what the difference is, but it feels more accurate in my mind. I changed the phrasing to make this clearer, but feel free to challenge my changes if you think it's weird 😄

Comment on lines 187 to 188
Assembly viewers show the final compiled version of your code in readable
assembly format. This can be an effective way to optimize low-level code. It is not effective
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assembly viewers, plural, so it's "They're", I think.

Suggested change
Assembly viewers show the final compiled version of your code in readable
assembly format. This can be an effective way to optimize low-level code. It is not effective
Assembly viewers show the final compiled version of your code in a readable
format. This information can greatly help optimizing low-level code. They're not effective

Alternatively, but I personally avoid "but"s. I don't know if it's even necessary to specify "assembly" again.

Suggested change
Assembly viewers show the final compiled version of your code in readable
assembly format. This can be an effective way to optimize low-level code. It is not effective
Assembly viewers show the final compiled version of your code in a readable, but still assembly
format. This information can greatly help optimizing low-level code. They're not effective

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to comment that "examining assembly" can be an effective way to optimize the code (not necessarily the viewers). I changed the wording accordingly.

Comment on lines 192 to 193
`software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than one ("resources"), so it's plural "tools".

Suggested change
`software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
`software optimization resources <https://www.agner.org/optimize/>`__ have released invaluable
tools for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.

But I'm not sure that fits quite right, either?

Suggested change
`software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
`software optimization resources <https://www.agner.org/optimize/>`__ are invaluable
for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.

Copy link
Member Author

@Ivorforce Ivorforce Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"emerged" is often used in scientific context in this way. It means that something that somebody made has slowly gained attention and is, by now, a useful tool known in the field. Something that somebody just made is never useful because its usefulness has to be proven first, by people using it regularly over time. Hence, they say "tools emerged as useful" instead of "somebody created useful tools".
That being said, if you are not familiar with this phrasing, it's probably best to change it. I'll go ahead with your second suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your explanation the term does make sense. If you want to make it clearer that's what happened, you could make it more explicit too.

Suggested change
`software optimization resources <https://www.agner.org/optimize/>`__ have emerged as an invaluable
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
`software optimization resources <https://www.agner.org/optimize/>`__ have gained popularity as an invaluable
tool for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.

or similar! But either way, it's a nitpick.

@Ivorforce Ivorforce force-pushed the high-low-level-tools branch from 878cdca to 31bdaa1 Compare October 12, 2025 16:07
@Ivorforce Ivorforce force-pushed the high-low-level-tools branch from 31bdaa1 to a357a9f Compare October 13, 2025 09:22
@lawnjelly
Copy link
Member

Before going in more detail, I do think we should consider what we hope to achieve:

Is the aim in the contributor documentation:

  1. provide already competent optimization programmers the information they need to make PRs specific to Godot
  2. provide a comprehensive resource teaching beginners how to optimize code

In the first draft I tried to mostly cover (1), with the idea that we have tried to introduce the beginner skills for optimization in the main documentation, as they are relevant not just to people making PRs, but to users optimizing their own GDScript, C# and c++ modules and extensions.

So in my mind there is a reasonable argument to shifting as much information as possible to the main documentation.

We should also consider there are also extensive external resources on teaching optimization / references - do we intend to become a go to guide for learning optimization on the internet, or should we limit things as to what is relevant to the engine (i.e. is there is a risk of going out of scope here and maintaining far more documentation than we need 🤔 )?

Also at the risk of bringing up controversial philosophy / gatekeeping, should we be encouraging newbies to submit optimization PRs until they have some skills / experience? Core maintainers already have to spend significant time reviewing low quality PRs, and the submitters can end up being disappointed / disillusioned when they realise the subject isn't as simple as they thought.

@Ivorforce
Copy link
Member Author

Ivorforce commented Oct 13, 2025

Is the aim in the contributor documentation:

  1. provide already competent optimization programmers the information they need to make PRs specific to Godot
  2. provide a comprehensive resource teaching beginners how to optimize code

Definitely just 1., not 2. I think we'd all agree to that.

In the first draft I tried to mostly cover (1), with the idea that we have tried to introduce the beginner skills for optimization in the main documentation, as they are relevant not just to people making PRs, but to users optimizing their own GDScript, C# and c++ modules and extensions.

So in my mind there is a reasonable argument to shifting as much information as possible to the main documentation.

I'm not sure if the information added here is relevant to users outside of those looking to optimizing the engine.
I would be on board with shortening these sections, but I think they all have a role to fulfill within the optimization PR workflow:

  • "Low level" vs "high level" clarifies our difference of expectations for both kinds of optimizations. Though I could see this information being shortened and interweaved into the "PR requirements" sections.
  • "Tools" enables us to provide a short explanation of our preferences (for PR purposes), to link resources, and to not have to intersperse information into other sections. The rest of the text is 'fluff' trying to rationalize our preferences, but I suppose it could be left out.

We should also consider there are also extensive external resources on teaching optimization / references - do we intend to become a go to guide for learning optimization on the internet, or should we limit things as to what is relevant to the engine (i.e. is there is a risk of going out of scope here and maintaining far more documentation than we need 🤔 )?

What we've provided here so far is very far from being a "go-to" guide. I'd love for our article to be a link hub to good resources instead, if we know any. The only ones I've used extensively so far are Agner Fog's and Compiler Explorer (both of which I've linked).

Also at the risk of bringing up controversial philosophy / gatekeeping, should we be encouraging newbies to submit optimization PRs until they have some skills / experience? Core maintainers already have to spend significant time reviewing low quality PRs, and the submitters can end up being disappointed / disillusioned when they realise the subject isn't as simple as they thought.

Hrm, yea, I can see us discouraging optimization PRs altogether for new contributors. But someone getting started on a performance PR can still save us some time, even if it doesn't turn out mergable on first try. Finding a good formulation for that would be nice.

For micro-optimizations, C++ compilers will often be aware of basic tricks and
will already perform them in optimized builds. Therefore, not all changes that
look like they should optimize the code will actually make the code faster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why viewing assembly can be important (particularly when benchmarks aren't super convincing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, good point. I'll try to weave this info in somehow.

Profilers are the most important tool for everyone optimizing code.
They show you which parts of the code are responsible for slow execution or heavy CPU load,
and are therefore excellent for identifying what needs to be optimized. Profilers can
also be used to identify whether the problem has been resolved, by profiling again after
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"identify whether performance has been increased"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up shortening this as well. Probably needs another round of feedback.

Assembly viewers
^^^^^^^^^^^^^^^^

Assembly viewers show the final compiled version of your code in a readable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assembly is a human readable form of the final compiled machine code, specific to the architecture being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's what I wanted to say. Any idea how to improve the wording?

Copy link
Member

@lawnjelly lawnjelly Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like:

Assembly
^^^^^^^^^
In some cases when making low level optimizations, it can be a good idea to view the assembly language generated by the compiler. Assembly language is a human readable form of machine code, which the final binary instructions used to drive the CPU. As such the assembly language will depend on the architecture, the compiler and the compiler settings.

Viewing assembly allows you to compare the before / after result, and to confirm hypotheses used to guide optimization, and to see what the compiler is doing in general. It can also be helpful to reviewers in relevant pull requests, so please post assembly with your PR where appropriate.

There are many ways to view assembly, including:

  • godbolt
  • setting clang / gcc / cl to output assembly when compiling the file of interest
  • inside your IDE debugger when running Godot
  • inside your profiling tool (if supported, e.g. Callgrind)

You may find the following resources useful:
...

I'll see if I can dig out some example for compiling a single file to assembly, it's a little bit fiddly...

clang++ -S -o clang_asm.s -c -std=gnu++14 -O3 -pipe -fpie -fno-exceptions -DNDEBUG -DNO_EDITOR_SPLASH -DTOUCH_ENABLED -DALSA_ENABLED -DALSAMIDI_ENABLED -DPULSEAUDIO_ENABLED -D_REENTRANT -DSPEECHD_ENABLED -DJOYDEV_ENABLED -DUDEV_ENABLED -DX11_ENABLED -DUNIX_ENABLED -DOPENGL_ENABLED -DGLES_ENABLED -D_FILE_OFFSET_BITS=64 -DPTRCALL_ENABLED -DMINIZIP_ENABLED -DZSTD_STATIC_LINKING_ONLY -DGLAD_ENABLED -DGLES_OVER_GL -DBT_USE_OLD_DAMPING_METHOD -DBT_THREADSAFE -Ithirdparty/bullet -Ithirdparty/libpng -Ithirdparty/glad -Ithirdparty/zstd -Ithirdparty/zlib -Iplatform/x11 -I. -I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include modules/bullet/area_bullet.cpp

I think from memory it involved seeing the verbose command line for compiling the file, then adding the necessary flags to output assembly (-S maybe?).

Copy link
Member Author

@Ivorforce Ivorforce Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds nice, but going back to your own original point — should we be going into that much detail? Most contributors won't (and imo, shouldn't) be looking at assembly, so maybe we should leave detailed explanations to other articles. My main idea with mentioning it at all is that we can link resources that do go into detail, and to state our opinion about it (i.e. "can be an effective way to optimize low-level code").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could leave out the "many ways to view assembly" bit I guess.

I'm kind of the opposite 😀 , I think we should be encouraging anyone making low level PRs to investigate and learn some basic assembly (but then I wrote in assembly about a decade before learning c++). Admittedly I know very little x86, but enough to get a gist of what it is doing.

Copy link
Member Author

@Ivorforce Ivorforce Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in low level PRs, I use my knowledge of assembly only to shortcut investigations, but end up optimizing the PR solely based on measurement results. That's why I think this knowledge is useful, but optional. (I suppose this is more relevant though to the point below, where we might ask people to provide their assembly investigations in the PR).

As for this discussion, I used your text as the general framework, but very harshly cut it to be minimal. Let me know if it works for you!

- Show that you improved the code either by profiling again, or running systematic benchmarks.
See `tools <#tools-for-optimization>`__ for more info.
- Test on multiple platforms where appropriate, especially mobile.
- When micro-optimizing, show assembly before / after where appropriate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't remove this, this is part of the reason for the whole document.

We were getting people submitting PRs without assembly in cases where it was needed to show what was happening, to show that their PR worked as described. The "where appropriate" is vague enough that it shows not needed in a lot of cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this point because I don't think it's relevant for most PRs, even most performance optimization PRs. Looking at assembly can be helpful, but it's far less important than actually measuring performance.

@Ivorforce Ivorforce force-pushed the high-low-level-tools branch 2 times, most recently from 1e19224 to 2fa83a5 Compare October 13, 2025 16:08
@Ivorforce Ivorforce changed the title Add 'high level / low level' and 'tools' sections to the optimization article. Add 'tools' sections to the optimization article. Oct 13, 2025
@Ivorforce Ivorforce force-pushed the high-low-level-tools branch from 2fa83a5 to 5b622d5 Compare October 13, 2025 16:12
@Ivorforce Ivorforce changed the title Add 'tools' sections to the optimization article. Add 'tools' section to the optimization article. Oct 13, 2025
@Ivorforce Ivorforce force-pushed the high-low-level-tools branch 2 times, most recently from d8a0e26 to 0e2f6e3 Compare October 13, 2025 17:41
@Ivorforce Ivorforce force-pushed the high-low-level-tools branch from 0e2f6e3 to cc7d357 Compare October 13, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants