Fix regression around OpenGL swapchain optimization for OpenXR #94894

BastiaanOlij · 2024-07-29T01:00:25Z

After a long discussion with @clayjohn we came to the conclusion that #84244 was the cause of our performance regression on OpenGL due to a misunderstanding on my part between a crucial difference between Vulkan and OpenGL.
This caused glFinish to be called in a scenario that wasn't needed.

Due to the confusion caused by original naming of methods I've kept end_frame in its new place, giving us the opportunity to do the proper swapchain swap on Vulkan, and proper timing check in OpenGL instead of the workaround that was in place before.

end_viewport has been renamed to gl_end_frame to better communicate that this is a call specific to OpenGL and is only called after output has been blitted to our swapchain and it has to be handled.

What we are still unsure of, seeing how things were before #84244 is whether the logic is correct when p_swap_buffers is false, which happens when a user called force_draw.

It is possible that the correct code in the XR branch should be:

if (blits.size() > 0 || !p_swap_buffers) {
	RSG::rasterizer->gl_end_frame(p_swap_buffers);
}

However, calling force_draw when XR is active is probably fairly dangerous to begin with as this would result in an extra call to xrWaitFrame with all sorts of timing things going wrong.

All in all, it would be good to test this PR against a normal application that uses force_draw to see if the functionality works as expected. I do not have a valid example project for this.

BastiaanOlij · 2024-07-29T02:49:40Z

Just rebased it, didn't realise my master was a few days old.. Shouldn't effect tests.

Decasis, I'll try and run your test app, might be a different issue there.

decacis · 2024-07-29T02:53:42Z

Note that my MRP was made specifically for the Quest 2. It won't show the performance difference on a Quest 3 since it has better specs. Also, the OpenXR vendors should be updated to 3.0.0 beta 3, or at least that's what I used for this test.

DanielKinsman · 2024-07-29T03:11:40Z

On the pico 4 my monkey head test (more monkeys, more better) gives 360 monkeys in 4.3dev1, 300 in 4.3.dev2, and 340 on this PR. So it is definitely better but maybe there's still some performance gain to be had.

It's pretty consistent run to run so I don't think it is just measurement error, but it is possible.

BastiaanOlij · 2024-07-29T03:13:30Z

Note that my MRP was made specifically for the Quest 2. It won't show the performance difference on a Quest 3 since it has better specs. Also, the OpenXR vendors should be updated to 3.0.0 beta 3, or at least that's what I used for this test.

Ah, that explains why for me it reached normal performance. Time to charge my Quest 2

BastiaanOlij · 2024-07-29T03:13:50Z

On the pico 4 my monkey head test (more monkeys, more better) gives 360 monkeys in 4.3dev1, 300 in 4.3.dev2, and 340 on this PR. So it is definitely better but maybe there's still some performance gain to be had.

It's pretty consistent run to run so I don't think it is just measurement error, but it is possible.

Ok that would suggest we're dealing with two separate issues..

decacis · 2024-07-29T04:15:36Z

@BastiaanOlij I can confirm, it seems to be a separate issue in my MRP... I just created another, much simpler project with built-in meshes (Spheres) and no animations or anything fancy and get 72fps in 4.3-dev1, ~56fps in 4.3-dev2 and 72 in this PR

So yeah, I can confirm this PR fixes the original issue.

Edit: for reference, this is the MRP I used for this test: https://drive.google.com/file/d/1trhfPlcXqwH8rJsLyTvQOaPxO6zyWw8p/view?usp=sharing I just changed the OpenXR vendors depending on what version I was testing on

Edit 2: also note that, my original project went through dozens of versions when I was bisecting. It is possible that something broke it in some way that creates issues in newer versions.

BastiaanOlij · 2024-07-29T07:26:40Z

@decacis yeah it's a pain bisecting around those builds because of the changes in deployment on Android. It's really easy to make a mistake.

Animations are also subject to whether you're doing a release or dev build as we have less compiler optimisations turned on in dev builds.

dsnopek · 2024-07-29T13:23:29Z

I did some testing using the MRP from #94856 on Quest 3. It required some minor changes to work on Quest 3, notably adjusting the export settings for Meta rather than Pico, and setting xr_interface.display_refresh_rate = 90 in _ready() because Quest defaults to 72.

My results (more monkey heads is better):

Version	Monkey Heads
Godot 4.2.2	530
Godot 4.3-rc1	390
This PR	530

@DanielKinsman mentioned above that in his tests on Pico, he wasn't able to get up to the same number with this PR as Godot 4.2.2, but in my tests I landed on exactly the same number.

Anyway, this PR seems to fix the issue in my testing, and the code looks good to me (although, I am no rendering expert).

akien-mga · 2024-07-29T13:23:57Z

Thanks!

RumarioVR · 2024-07-29T20:27:03Z

I can't confirm that at the moment. It's probably not that easy to test it with the Monkey project. I also noticed an improvement in certain scenes. But I also have scenes that show no improvement at all. Still 73 vs 45 FPS. Obviously there are various factors that influence the result. I will investigate this further and try to isolate the problem. What MSAA settings did the other testers use?

clayjohn · 2024-07-29T22:25:22Z

@RumarioVR Do you have MSAA enabled? If so, that might explain it. MSAA wasn't added until 4.3 dev1. So if your project has MSAA enabled, it will definitely be slower in 4.3 than in 4.2

BastiaanOlij · 2024-07-30T00:20:28Z

@RumarioVR you can fix the issue with the textures by including this PR: #94902
That should allow you to test without introducing new variables.

MSAAx2 is actually pretty much costless on our current Android implementation if the required GL extensions are available (which they are on XR2 chipsets/drivers) AND assuming no features have been enabled that require multiple render passes and thus reading render target data back into tile memory (which with MSAA doubles the performance penalty for every MSAA level (so double for x2, quadruple for x4, etc.).
Features that seriously impact performance are:

Glow/bloom
Reading from screen or depth texture
Scaling (mostly effects XR where it disables foveated rendering, so what you win by rendering less you loose by rendering less optimally)*
Color corrections that require our intermediate buffer to be used

Combine any of those features with MSAA and the performance degradation becomes that much worse.

On desktop MSAA will always have a performance penalty as the unique scenario where it is nearly costless revolves around special optimisations possible in TBDR architecture under tight restrictions (just on desktop the performance penalty is tiny compared to mobile).

@clayjohn we need to think of a way to better inform users when they enable performance degrading options, especially on mobile.

If you do want to use scaling to improve performance in XR, you can lower the render resolution of the XR viewport itself by changing the render_target_size_multiplier and the XR runtime will handle the upscaling. But in order for this to look good you have to use Composition layers to make text in menus readable.

BastiaanOlij added bug topic:rendering topic:xr regression labels Jul 29, 2024

BastiaanOlij added this to the 4.3 milestone Jul 29, 2024

BastiaanOlij self-assigned this Jul 29, 2024

BastiaanOlij requested a review from a team as a code owner July 29, 2024 01:00

BastiaanOlij requested review from clayjohn and dsnopek July 29, 2024 01:10

This comment was marked as outdated.

Sign in to view

Fix regression around OpenGL swapchain optimisation for OpenXR

1eb0039

BastiaanOlij force-pushed the fix_regression_84244 branch from d348d97 to 1eb0039 Compare July 29, 2024 02:47

clayjohn approved these changes Jul 29, 2024

View reviewed changes

dsnopek approved these changes Jul 29, 2024

View reviewed changes

akien-mga merged commit 0e9caa2 into godotengine:master Jul 29, 2024
18 checks passed

BastiaanOlij deleted the fix_regression_84244 branch July 29, 2024 23:54

akien-mga changed the title ~~Fix regression around OpenGL swapchain optimisation for OpenXR~~ Fix regression around OpenGL swapchain optimization for OpenXR Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression around OpenGL swapchain optimization for OpenXR #94894

Fix regression around OpenGL swapchain optimization for OpenXR #94894

BastiaanOlij commented Jul 29, 2024

This comment was marked as outdated.

BastiaanOlij commented Jul 29, 2024

decacis commented Jul 29, 2024 •

edited

Loading

DanielKinsman commented Jul 29, 2024

BastiaanOlij commented Jul 29, 2024

BastiaanOlij commented Jul 29, 2024

decacis commented Jul 29, 2024 •

edited

Loading

BastiaanOlij commented Jul 29, 2024

dsnopek commented Jul 29, 2024

akien-mga commented Jul 29, 2024

RumarioVR commented Jul 29, 2024

clayjohn commented Jul 29, 2024

BastiaanOlij commented Jul 30, 2024 •

edited

Loading

Fix regression around OpenGL swapchain optimization for OpenXR #94894

Fix regression around OpenGL swapchain optimization for OpenXR #94894

Conversation

BastiaanOlij commented Jul 29, 2024

This comment was marked as outdated.

BastiaanOlij commented Jul 29, 2024

decacis commented Jul 29, 2024 • edited Loading

DanielKinsman commented Jul 29, 2024

BastiaanOlij commented Jul 29, 2024

BastiaanOlij commented Jul 29, 2024

decacis commented Jul 29, 2024 • edited Loading

BastiaanOlij commented Jul 29, 2024

dsnopek commented Jul 29, 2024

akien-mga commented Jul 29, 2024

RumarioVR commented Jul 29, 2024

clayjohn commented Jul 29, 2024

BastiaanOlij commented Jul 30, 2024 • edited Loading

decacis commented Jul 29, 2024 •

edited

Loading

decacis commented Jul 29, 2024 •

edited

Loading

BastiaanOlij commented Jul 30, 2024 •

edited

Loading