Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffer orphan / stream options #47864

Merged
merged 1 commit into from
Apr 14, 2021
Merged

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Apr 13, 2021

Allows users to override default API usage, in order to get best performance on different platforms.

Ideally we would like to treat all OpenGL devices in a similar fashion, but unfortunately it seems as if the best way to use OpenGL with dynamic buffers (particularly for 2d) seems to vary on different hardware / OS / drivers.

This PR reintroduces 4 project settings to allow the user to override default settings (potentially for each platform using the project settings overrides).

orphan_settings

Each is settable to Default, Off or On. Having an explicit default setting allows us to improve the defaults in later releases as we gain more user feedback.

Related to #47833, #47824

About

Without having access to a large number of devices, we have relied on limited feedback from users. We currently believe that buffer orphaning is especially useful on mobile. It may not make much difference on most desktops, however having it on, or off, may prove better or worse on some platforms, especially Mac. It is possible some Macs may suffer stalls with orphaning on, and some with orphaning off. This makes it problematic to set a default for that platform.

Defaults

The previous defaults for Godot were to use orphaning only when GLES_OVER_GL was not set. We did have complaints of low performance on Macs, so have been using orphaning in all cases through the 3.3 betas. Some users are still reporting low performance that can be fixed by not using orphaning.

As we do not have the information to make a hardware database as yet, the best we can offer for now is to set the default that works in what appears to be the majority of cases (as defined by user feedback), and leave the options to set it to custom values in project settings. The consequence of users being able to override these defaults is that this will affect their exports, and they may expect what works on their machine will also work for others (which may not be the case). So we will have to make this very clear in the documentation, that changing the defaults should be done with care.

As to the default we should use for orphaning, I really have no idea. On the whole I get the impression we've had better feedback for results with orphaning defaulting to on in all cases, but we can equally well set it to the same logic we used in earlier versions (selecting by GLES_OVER_GL). These settings were exposed in some of the earlier betas, but unfortunately very few people tried them and gave feedback.

Flags

As with orphaning, GL offers a choice, either DYNAMIC or STREAM flags, without making clear which would be best. Some users have reported better performance using STREAM flag. This flag is selectable in this PR.

See https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glBufferData.xhtml for more info on flags. We use GL_DYNAMIC_DRAW (default), or GL_STREAM_DRAW if stream is switched on in the project setting. The docs are confusing. We update a buffer multiple times, and use it once per update. According to the docs we should therefore be using DYNAMIC:

DYNAMIC
The data store contents will be modified repeatedly and used many times.
STREAM
The data store contents will be modified once and used at most a few times.

However, they also suggest elsewhere using stream flag with buffer orphaning, and I have a sneaking suspicion we should be changing to using this as the default:
https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming

We have used DYNAMIC up till now, and had no feedback on the use of STREAM. In addition, it seems possible that some drivers detect your usage pattern and make a sensible decision irrespective of what flags you set here, there often doesn't seem to be a lot of difference. But it may be important to get right on some small set of hardware.

At this late stage for 3.3, it seems safer to keep the current default, see if we can get some feedback from the custom settings, and try STREAM as a default for one of the betas for the next Godot version.

It may turn out that we should use STREAM with orphaning, and DYNAMIC when not using orphaning.

Other options

There are some other options for dynamic buffers - such as glMapBuffer. This is only available in core in GLES3, and via an extension in GLES2. We haven't tried this - it may be better or worse on some hardware, adding to the choice difficulty. You would hope that all implementations would have a decent implementation for the more basic calls. But that may or may not be the case.

Another option is the use of multiple buffers round robin style, however in our case it may not be possible because we reuse buffers multiple times per frame and rely on buffer renaming / sensible buffer handling from the driver.

Notes

  • I haven't written class reference yet, hopefully I'll get some feedback on the PR and how we should approach this area, I'm happy to change approach / welcome ideas
  • Without the facilities to do systematic testing over a range of devices, this is a very difficult area, as we are mostly relying on anecdotal evidence and guesswork(!)

@lawnjelly lawnjelly requested a review from a team as a code owner April 13, 2021 17:31
@Calinou Calinou added this to the 3.3 milestone Apr 13, 2021
glBufferSubData(GL_ARRAY_BUFFER, 0, buffer_size, multimesh->data.ptr());
}

// this could potentially have a project setting for API options as with 2d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean there should be separate settings for 2D and 3D?

Side note, might be worth removing the extra tabs on the commented lines, commenting out code on column 0 doesn't work well with automated code formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did combine them in the beta 2 build for convenience, but we haven't had issues about slowness here from the betas afaik, so I'm satisfied that fixing it to the current version should be ok for now.

@akien-mga
Copy link
Member

As long as the defaults are the same as what we've had in recent 3.3 RCs, I'm fine with adding more configuration options so we can get more tests on different hardware.

The docs should make it crystal clear for all of them that:

  • They're experimental, and not meant to be used in production as they can have different results for different users.
  • They might be removed in future releases, and the behavior for "Default" may also change.

i.e. use them for testing and report results, not as a silver bullet.

@YuriSizov
Copy link
Contributor

So a proper "fix" would still be a sort of hardware database at a later date when we have more data points?

@lawnjelly
Copy link
Member Author

So a proper "fix" would still be a sort of hardware database at a later date when we have more data points?

Yes, it sounds like it, although we may be able to make some generalisations other than just using the platform to decide the settings (e.g. the OS version).

@lawnjelly
Copy link
Member Author

I've done the class ref so let me know any changes. I just want to do a final debugging check just to ensure all the settings are actually taking effect(!). 👍

@akien-mga
Copy link
Member

akien-mga commented Apr 14, 2021

Maybe we could mention in the enum what "Default" defaults to? i.e. "Default (On)" or "Default (Off)".
That wouldn't prevent us from changing the defaults since what's serialized anyway is 0 (well actually nothing is written to project.godot if it's 0 anyway).

@lawnjelly
Copy link
Member Author

Ok I'll do that (we couldn't before because the default setting changed according to GLES_OVER_GL). We can always change what the default setting is and the user interface in 3.3.1 if we find a better default.

@akien-mga
Copy link
Member

Ok I'll do that (we couldn't before because the default setting changed according to GLES_OVER_GL). We can always change what the default setting is and the user interface in 3.3.1 if we find a better default.

Ah right, for the ones that change based on the platform it could be "Auto" instead of "Default".

@lawnjelly lawnjelly force-pushed the orphan_options branch 2 times, most recently from f4a607b to a9278c9 Compare April 14, 2021 10:57
@lawnjelly lawnjelly changed the title [WIP] Add buffer orphan / stream options Add buffer orphan / stream options Apr 14, 2021
@lawnjelly
Copy link
Member Author

Just a note we've changed the default legacy flag to STREAM. Based on our testing of a test project #47824 from a few contributors on linux, mac, windows and android it seems to either run a lot better or no change, we've had no slowdowns so far.

@akien-mga
Copy link
Member

Maybe edit the commit message to point out that it defaults to STREAM for the legacy path?

Allows users to override default API usage, in order to get best performance on different platforms.

Also changes the default legacy flags to use STREAM rather than DYNAMIC.
@akien-mga akien-mga merged commit 4a942af into godotengine:3.x Apr 14, 2021
@akien-mga
Copy link
Member

Thanks!

@lawnjelly lawnjelly deleted the orphan_options branch April 14, 2021 11:24
@oeleo1
Copy link

oeleo1 commented Apr 14, 2021

This is hard core stuff. As @akien-mga mentioned in his first response, as long as the defaults are the status quo we should be fine and safe (understand no crashes). Now, regarding the DYNAMIC -> STREAM change, this is a big one. But noone knows how big it is without some empirical evidence as explained by @lawnjelly and the external references. Despite Godot's usage pattern and the hints we specify, GL drivers are free to do anything they like under the hood as long as they comply with the specs. And I am pretty sure they do so, with lots of folklore both across vendors and across driver versions of the same vendor. That said, I trust the initial preliminary feedback that stream is better than dynamic. Nonetheless, at this point this remains an interesting experiment before a stable release.

@oeleo1
Copy link

oeleo1 commented Apr 14, 2021

PS: would there be another RC9 for me (or someone else) to try this out on iOS before stable comes out? Or make an interim iOS build with templates. Unfortunately I gave up on building things myself as I am not that involved with the dev and building on iOS with export templates is not a piece of cake.

@lawnjelly
Copy link
Member Author

lawnjelly commented Apr 14, 2021

PS: would there be another RC9 for me (or someone else) to try this out on iOS before stable comes out? Or make an interim iOS build with templates. Unfortunately I gave up on building things myself as I am not that involved with the dev and building on iOS with export templates is not a piece of cake.

You can try 3.2.4 beta 2, the settings were exposed there 5 months ago. Unfortunately few people tested them for us. Just set the legacy flags to stream and run the circles project in #47824.

See #43038 (comment)

EDIT: Ah or indeed RC9: 😁
https://godotengine.org/article/release-candidate-godot-3-3-rc-9

@oeleo1
Copy link

oeleo1 commented Apr 14, 2021

OK, same results in iOS across the board. I think I did all combinations on GLES2 of the repro project with RC9. I tested 2 phones:

  1. iPhone SE 1st gen (quite old). Legacy or batching, stream hint on or off: ~25fps.+/-3-4 fps
  2. iPhone 12 mini (latest gen). Legacy or batching, stream hint on or off: ~35fps +/- 2-3 fps

I am somewhat puzzled that I get the same number in all settings permutations for a given hardware. Is this arc/circles project special in that it doesn't take advantage of any batching or drawing primitive shortcuts? Or could it be that I am doing something wrong? or iOS exports don't take these settings into account (nah, can't be).

@lawnjelly
Copy link
Member Author

Yes, there is no batching at all, as circles and polylines (arcs) are drawn using the old methods. The thing of interest here is to compare performance in 3.3RC9 (with the new settings) with 3.2.3, as we are aiming to get performance around the same as 3.2.3, or at least prevent any obvious stalls.

In general, now batching is available, you will notice the difference more when you use non-batched primitives - even if the performance of these hasn't changed, relatively they will slow your game down. This is just how bottlenecks work, a chain is only as strong as the weakest link. We will try and add these this information to the manual to make it a bit more obvious.

In fact circles and polylines could probably be rewritten to be batched (if they don't use line smoothing), or at least rewritten to be a lot more efficient, it has just not been a priority unless we have evidence of it being a problem outside of specific benchmarks.

@oeleo1
Copy link

oeleo1 commented Apr 15, 2021

Okay. What's so special about 3.2.3? I modified the project slightly
circle_arc.zip to get the info on screen.

iPhone SE 1st generation
SE_Gen1_323_off
SE_Gen1_323_on
SE_Gen1_rc9_off
SE_Gen1_rc9_on

iPhone 12 mini
12_mini_323_off
12_mini_323_on
12_mini_rc9_off
12_mini_rc9_on

Same difference. I guess this is good.

PS: Gee, this markup stuff is not mine as well :-)

PPS: For rc9 all batching/2D parameters (legacy/batching and stream/dynamic) are left to their defaults.

@akien-mga
Copy link
Member

What's so special about 3.2.3?

It's the current stable release, so any performance drop in 3.3 compared to 3.2.3 would be perceived as a regression (even if there are massive gains in other less corner-case scenarios). So we aim to minimize such performance drops, while keeping the massive gains in situations which benefit from batching, buffer orphaning, etc.

@oeleo1
Copy link

oeleo1 commented Apr 15, 2021

On an side note, I am pleased to report that with rc9 our device test farm was running Release (not Debug) builds of internal reference projects in auto test mode for 15+ minutes without a crash. In all previous builds a crash happens within the minute.

So this is a heads up for the fact that the upcoming 3.3 release seems to be a quantum leap in quality.

As for the crashes, hard to tell what’s behind them without some serious debugging and time investment. My gut feeling is that it has something to do with yield resumes / scripts gone as we use yields extensively across the board. But this is blind speculation at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants