Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Shader goodies: async. compilation + caching #46330

Closed
wants to merge 3 commits into from

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Feb 22, 2021

Superseded by #53411


The main goal of this PR is to reduce stalling in games.

image

Current limitations:

  • Async. compilation is only implemented for spatial shaders (it wouldn't be very difficult to extend it to canvas shaders).
  • The current implementation is for GL ES 3 only.

DISCLAIMER: This implementation has been used in a project where it actually helped reducing stalling caused by shader compilation. However, this can be considered experimental and some testing would be very welcome. Also, the code itself may be better in how some values are made available to the different pieces of the renderer. Ideas welcome!

Shader caching

As long as the target platform supports the program binary GL extension, this is just enable and forget.

Some remarks:

  • Writes to the cache are async. to prevent stalling the render thread as much as possible.
  • If async. shader compilation is enabled in addition to caching, shader "reconstruction" from its cache file is also potentially done asynchronously.
  • Whether caching is enabled affects both the project and the editor. It'd be nice to separate them so the editor has its own setting for caching, but I couldn't find a reasonable way to do that in the time I could spend on this, because the rasterizer is initialized before the editor settings singleton is ready.

Asynchronous compilation of shaders

It will work if enabled and supported by the GL driver. If native parallel compilation is supported, that's used, which is the most efficient. Otherwise, asynchronicity is achieved via a secondary GL context (and another thread) that sends the compiled shader back to the main one in its binary form, which means the program binary extension must be supported. If both fail, async. compilation is effectively disabled.

Three fallback modes are added to both manually created shaders (either codey or visual) and SpatialMaterials: none, simple and no render. Please check the diff where these are explained in the built-in documentation.

The default mode is simple. You can explicitly set a more conservative mode for any shader/material.

The simple fallback is a shadeless shader that is able to transfer to itself the following stuff from the original shader:

  • Albedo uniform: it must be called albedo or albedo_color.
  • Texture scaling uniforms: called uv1_scale and uv1_offset.
  • Albedo texture: the first texture uniform in the original material with any hint_*_albedo; else, the first 2D texture used in the material, according to the order of uniforms.

image

image

image

Please also see the diff for an explanation of the different project settings.


This code is generously donated by IMVU.

@Calinou
Copy link
Member

Calinou commented Feb 22, 2021

Whether caching is enabled affects both the project and the editor. It'd be nice to separate them so the editor has its own setting for caching, but I couldn't find a reasonable way to do that in the time I could spend on this, because the rasterizer is initialized before the editor settings singleton is ready.

Can you use feature tags to give different values to the project settings?

Alternatively, you could look at how batching can be toggled separately in the editor and running project (at least in 3.2.3).

@RandomShaper RandomShaper force-pushed the shader_goodies_3.2 branch 3 times, most recently from 46475fc to 6b12f1a Compare February 22, 2021 21:59
@BastiaanOlij
Copy link
Contributor

This is so cool! I need to find some time to give it a proper run but I love the solution.

@@ -2430,7 +2430,11 @@ bool VisualServerScene::_render_reflection_probe_step(Instance *p_instance, int
}

_prepare_scene(xform, cm, false, RID(), VSG::storage->reflection_probe_get_cull_mask(p_instance->base), p_instance->scenario->self, shadow_atlas, reflection_probe->instance);

bool forced_sync_backup = VSG::storage->is_forced_sync_shader_compile_enabled();
VSG::storage->set_forced_sync_shader_compile_enabled(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this left in from debugging? Or is there a reason that reflection probes always need force sync enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intended because in the project this was initially written for reflection probes were UPDATE_ONCE and so they only have one chance to capture the look of the real shaders.

However, now I realize that for general use this is not enough. Maybe it's just a matter of doing that unless it's UPDATE_ALWAYS. Probably it will be more involved than that, but may be a good start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be enough to change true to update_mode == UPDATE_MODE_ONCE.

A perfect solution would be to delay the capture until all shaders are compiled. But that is probably out of scope for this PR

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early comments:

  1. This is incredible! Overall design looks great and so happy to have this PR. Looks like 3.2.5 is going to be an exciting release
  2. How difficult will it be to add support for particles shaders and canvas_item shaders? It looks like the functionality is already built into shader, so is it just a matter of exposing it in the material and in rasterizer_canvas.glsl? and the relevant places for partices? I'd like to support all shader types before merging

GLOBAL_DEF("rendering/gles3/shaders/max_concurrent_compiles", 4);
GLOBAL_DEF("rendering/gles3/shaders/max_concurrent_compiles.mobile", 1);
GLOBAL_DEF("rendering/gles3/shaders/simple_fallback_modulate", Color(1, 1, 1));
GLOBAL_DEF("rendering/gles3/shaders/force_no_render_fallback", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We define all project settings in visual_server.cpp now. This way users can still see the gles3 settings when running in GLES2 mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll fix it.

@@ -683,7 +683,6 @@ void EditorSettings::_load_defaults(Ref<ConfigFile> p_extra_config) {

_initial_set("project_manager/sorting_order", 0);
hints["project_manager/sorting_order"] = PropertyInfo(Variant::INT, "project_manager/sorting_order", PROPERTY_HINT_ENUM, "Name,Path,Last Modified");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of this I guess

config.program_binary_supported = GLAD_GL_ARB_get_program_binary;
config.parallel_shader_compile_supported = GLAD_GL_ARB_parallel_shader_compile || GLAD_GL_KHR_parallel_shader_compile;
#else
config.program_binary_supported = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to need a special case for WebGL as it never supports glProgramBinary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess that in the WebGL case we will only have the possibility of using the approach based on the parallel compile extension, with no fallback. Also, caching won't be possible at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yeah. :(

if (!Engine::get_singleton()->is_editor_hint()) {
ShaderGLES3::force_no_render_fallback = (bool)ProjectSettings::get_singleton()->get("rendering/gles3/shaders/force_no_render_fallback");
#ifdef DEBUG_ENABLED
ShaderGLES3::force_use_fallbacks = (bool)ProjectSettings::get_singleton()->get("debug_force_use_fallbacks");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ShaderGLES3::force_use_fallbacks = (bool)ProjectSettings::get_singleton()->get("debug_force_use_fallbacks");
ShaderGLES3::force_use_fallbacks = (bool)ProjectSettings::get_singleton()->get("rendering/gles3/shaders/debug_force_use_fallbacks");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll fix that soon.

Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a cool PR. Really like the approach with the fallback.

I've only tested on Windows and don't have a good project to stress test it but it looks like it's caching my materials and compiling them properly with all the settings on.

Doing a pass over the code I think Clayjohn already found more then I spotted, looks well structured and a sound approach. Hope people are able to test the other platforms.

@LinuxUserGD
Copy link
Contributor

Should fix #13954
#13954 (comment)

@TokisanGames
Copy link
Contributor

TokisanGames commented Feb 23, 2021

I have one of the larger Godot games in development. Out of the Ashes is a 3D ARPG that takes up 2GB on export, 1.2GB of VRAM and 300MB of RAM. It uses a 4k heightmap terrain, 4k sky boxes, and many assets. It stutters terribly during the first 10 seconds of each scene load, even on tiny scenes without terrain or sky. Most shaders are basic spatialmaterials w/ 512-2k textures.

I built this PR and tested my project on Win 10/64, GTX 1060 6B, Core i7 8750h, 16GB, SSD. It stutters just as severely. The only difference is that it visually displays the stages of shader compilation as it stutters, rather than showing me fully textured materials, then stuttering like stock godot. So the end result is actually worse.

I loaded my project, enabled Rendering/GLES3/Cache Enabled and Async Compile Enabled. and left the rest. Running multiple times repeats the same results.

I do get the following warning:

Godot Engine v3.2.4.rc.custom_build.6b12f1aa2 - https://godotengine.org
OpenGL ES 3.0 Renderer: GeForce GTX 1060/PCIe/SSE2
WARNING: ProjectSettings::_get: Property not found: debug_force_use_fallbacks
     At: core\project_settings.cpp:209
Shader cache path: C:\Users\cory\AppData\Local/cache/godot_shaders/
Shader cache size: 14 MiB
Shader cache: ON
Async. shader compilation: ON (full native support)
OpenGL ES Batching: ON

Here is what I see when I load the game:
godot_shader_load

Every other scene does the same thing, visually building the materials over about 3 seconds, AND the performance still lags. It's not just a visual effect, but stuttering still consumes the engine and player controls as it compiles the shaders, with no profiling information available or clue as to WTF it's doing. The engine just pauses, resumes, pauses, resumes...

What I really want is an option to precompile all shaders from code that I can stick in _init() or _ready(). I already load all scenes in the game to fix other performance issues. There's no reason at all why the shaders should wait until they show up on screen to be compiled. What the engine should really do is compile all shaders automatically before _ready() is called, or in the editor like UE.

Can you provide functionality to at least precompile on demand from code please? I'd be happy to loop over all materials and add a .compile() before giving control to the player.

If you want access to my repo so you can test directly, dm me your email address on twitter.

Edit: Removed healthbars issue - present in stock 3.2.4rc


Clickable images from Out of the Ashes (follow @TokisanGames):

@lawnjelly
Copy link
Member

lawnjelly commented Feb 23, 2021

I don't know if it affects this PR, but a possible snag afaik when you tell gl to compile a shader, some drivers don't actually compile the shader, they defer a bunch of work until it is first used. So if this is the case, shifting it to a different thread may not help as much as hoped.
EDIT : Ah this may not apply if it is using an extension which does guarantee the compile at the call time. 👍 I haven't examined the PR in detail.

@naithar
Copy link
Contributor

naithar commented Feb 23, 2021

Doesn't seem to work on iOS. But seems to be working fine on macOS.
I'm using a test project from #45173 (comment)
Same behaviour on device and simulator.

Shader Caching:

2021-02-23 17:48:02.108960+0300 Test[45581:8822595] **ERROR**: Program binary cache file is corrupted. Ignoring and removing.
2021-02-23 17:48:02.109004+0300 Test[45581:8822595]    At: drivers/gles3/shader_cache_gles3.cpp:76:retrieve() - Program binary cache file is corrupted. Ignoring and removing.
**ERROR**: Program binary cache file is corrupted. Ignoring and removing.
   At: drivers/gles3/shader_cache_gles3.cpp:76:retrieve() - Program binary cache file is corrupted. Ignoring and removing.
2021-02-23 17:48:02.109382+0300 Test[45581:8822647] **ERROR**: Condition "!p_src" is true.
2021-02-23 17:48:02.109419+0300 Test[45581:8822647]    At: drivers/unix/file_access_unix.cpp:278:store_buffer() - Condition "!p_src" is true.

Shader Caching + Async Compile Enabled results in:

**ERROR**: Program binary cache file is corrupted. Ignoring and removing.
   At: drivers/gles3/shader_cache_gles3.cpp:76:retrieve() - Program binary cache file is corrupted. Ignoring and removing.

...

2021-02-23 17:33:29.477363+0300 Test[47795:5419426] **ERROR**: SceneShaderGLES3: Vertex Program Compilation Failed:
ERROR: 0:516: Use of undeclared identifier 'uv_interp'
ERROR: 0:516: Use of undeclared identifier 'uv_interp'

The shader source are also reported, but the log is too big and might be hard to read: https://gist.github.com/naithar/a39c0585bdb3dab1cc88c30b6ee04afa

Shader Caching + Async Compile Enabled + Force no render fallback results in:

2021-02-23 17:38:38.494243+0300 Test[50073:5442643]    At: drivers/gles3/shader_gles3.cpp:852:_complete_link() - SceneShaderGLES3: Program LINK FAILED:
**ERROR**: SceneShaderGLES3: Program LINK FAILED:

   At: drivers/gles3/shader_gles3.cpp:852:_complete_link() - SceneShaderGLES3: Program LINK FAILED:

2021-02-23 17:38:38.494335+0300 Test[50073:5442643] **ERROR**: Program binary from compile queue has been rejected by the GL. Bug?
2021-02-23 17:38:38.494412+0300 Test[50073:5442643]    At: drivers/gles3/shader_gles

...

**ERROR**: Program binary cache file is corrupted. Ignoring and removing.
   At: drivers/gles3/shader_cache_gles3.cpp:76:retrieve() - Program binary cache file is corrupted. Ignoring and removing.

Last two combinations (with async enabled) result in cube not being rendered.
It also seems like initial source for #45173 is shader compilation taking a lot of time on simulator compared to device, not opengl drivers.

@clayjohn
Copy link
Member

@tinmanjuggernaut I have a feeling the stuttering and load times are from loading the models and textures rather than shader compilation. You mentioned that most of your shaders are basic SpatialMaterials. SpatialMaterial shaders are compiled once and then shared between all SpatialMaterials. Additionally, they are very simple and compile nearly instantly (unless you are on a very old device).

To test if shaders really are the issue, try loading your scene with the camera pointing straight up with no objects in its field of view. If it still stutters and takes 10 seconds to load, the problem isn't shader compiling. If it loads quickly and then stutters once you move the camera to view the scene + character, then the issue is shader compiling.

@tcoxon
Copy link
Contributor

tcoxon commented Feb 23, 2021

This is really cool! I've been dying for something like this to use for my project!

I gave it some testing on 64-bit Ubuntu Linux 16.04. The project is Cassette Beasts. I can give you access to the project under NDA if it would help--DM me on twitter (@tccoxon) or email me (tom@bytten-studio.com).

Some things I noticed:

  1. If enabled, the shader cache grows each time I run my project, even if I'm only loading the same scene and the same shaders. I don't generate shader code at runtime, so in theory the shaders should be the same each time the game loads. The cache should reach a maximum size and stay there, but it doesn't seem to?

  2. The shader cache is hardcoded to a max size of 512MiB. Could that be exposed as a project setting?

  3. I get a lot of this in my error log where I had no errors before:

E 0:00:05.319   _get_uniform: Condition "!version" is true. Returned: -1
   <C++ Source>  drivers/gles3/shader_gles3.h:486 @ _get_uniform()

I still get some stutters when I run with async compilation enabled, but I haven't eliminated other causes yet.

@tinmanjuggernaut The stuttering could be as @clayjohn says. As for the visual effect, you can render materials while on a loading screen to cause them to compile. There are some intricacies, e.g. you need to make sure certain shadow and environment settings are the same as what you're going to use in the scene. And you also need to know if a material is going to be used with a multimesh or just on a solo mesh, since all these factors lead to different shaders being generated. Happy to chat with you about what I've done in twitter DMs (@tccoxon) if you like.

@TokisanGames
Copy link
Contributor

@clayjohn Thanks for the ideas. Shader re-compiling is definitely happening live and there's plenty of time to load every resource in the game. Here are more details:

  • The game takes 25s to load, using ResourceLoader and does not show the start menu until all levels are 100% loaded. The start menu background is identical to level 1, except for the skybox and lighting. All of the same assets and main character are on screen. Every shader from level 1 should be compiled, but...

  • When switching to level 1 (loaded scene is added to scene tree w/ add_child()), everything displays in the nearby vicinity, however when moving the camera around it stutters multiple times within the first 10 seconds before finally allowing game play without stutters. Engine.get_frames_per_second() drops significantly. It's the only indication that the engine is doing anything during these moments since there's nothing else indicative on the monitor or profiler tabs.

  • When using this PR, I saw materials on my main character (e.g. her hair or cape) which were already visually compiled on the title screen, start on level 1 looking fine, then flash to white and black materials for a moment, before looking normal again, during these initial seconds. Then switching to level 2, 3, etc it does the same thing. Same assets within the same Player scene, which is in each level. Each level is added or removed from the scene tree.

If it loads quickly and then stutters once you move the camera to view the scene + character, then the issue is shader compiling.

  • The above should make the point. However I loaded the game, let it sit on the title screen for a while, then loaded level 1 facing forward for about 30 seconds without moving. Then I moved the camera around and her hair recompiled visually and there was a little stutter. Then I loaded level 2 and I watched all materials (landscape, hair, skin, cape) recompile again and much stuttering as I moved the camera around. Going back to level 1 or 2 a second time, the shaders don't recompile.

@tcoxon Thanks. Discord may be easier TinmanJuggernaut#7375 (@RandomShaper or @clayjohn, feel free to reach me here as well). We do have different lighting per scene. So even basic SpatialMaterials need to be recompiled for different lighting conditions? We have a loading screen and I'm more than happy to manually initiate compiling during this time if it was exposed in the engine, or if it is, that I know how to do it.

@TokisanGames
Copy link
Contributor

@Calinou Thanks, but that setting makes no difference for me. Mine stutters windowed or full screen, with or without that setting. The ANGLE PR (#44845) looks interesting. Also I do have an Optimus (re: godotengine/godot-proposals#1725), though I only use the nvidia card. Is stuttering in Godot limited to Optimus? I haven't experienced it in any other application.

Currently, I'm using @tcoxon 's suggestion of applying every material in the scene to a plane and waiting for it to render, and I simultaneously rotate the camera 360 degrees. Neither is adequate alone. I'm still testing, but this seems to have addressed 99% of the stuttering, even in stock Godot.

Adding this PR means if Godot decides to recompile one again, hopefully it will be faster and with a shorter lag. However there's still an issue with visual artifacts when it does recompile. I just observed a mesh fully instanced, textured and animating, recompile its material and flash to black & white before coming back. In my earlier tests above I noticed this quite a bit on my main character's hair or other objects that already had a material in the current lighting, then it recompiles, lags, flashes b&w, before coming back exactly as it was.

@RandomShaper
Copy link
Member Author

I will be away from PC for at least two weeks. When I'm back I'll do my best to refine this PR as soon (and as well) as possible. Just FYI.

Base automatically changed from 3.2 to 3.x March 16, 2021 11:11
@aaronfranke aaronfranke modified the milestones: 3.2, 3.3 Mar 16, 2021
@akien-mga akien-mga modified the milestones: 3.3, 3.4 Mar 26, 2021
@akien-mga akien-mga changed the title Shader goodies: async. compilation + caching (3.2) [3.x] Shader goodies: async. compilation + caching Mar 26, 2021
@ghost
Copy link

ghost commented May 21, 2021

I’ve made a simple benchmark project to test out execution times with your implementation and Godot 3.3.1.
It cycles through 100 materials and prints the total time spent doing so.
Shader Cache and Async Compile Enabled used in your Godot build.
Just run the project in the editor and wait for it to finish and print out the result.

Measured times in seconds:
Your build: 2.131
Godot 3.3.1: 1.988

So in my simple test it showed no substantial time gain.
Am I naive to benchmark it like that?
Is there a better option to measure it or my attempt is valid as it is?

ShaderCompileTimeBenchmark.zip

@RandomShaper
Copy link
Member Author

@Leocesar3D, I think your test is well formulated. However, depending on the specifics of the materials they may trigger or not the fast path. It'd be interesting to run it with caching disabled and also doing multiple rounds over the set of materials.

Please remember that I still have to do (when I can get some time for it) a number of improvements, additional tests and adding more flexibility because the current implementation may be deciding too much.

In any case, thank you for your feedback. I hope I can eventually make this work as expected.

@akien-mga
Copy link
Member

Also keep in mind that to get accurate measurements, you'd have to build both the PR and the last commit before that PR with the same toolchains and options. Comparing official builds of 3.3.1-stable with a custom-made build of this PR would be tricky because:

  • This PR is based on a branch older than 3.3.1-stable, so a lot of things may have changed in the meantime.
  • Custom builds are likely done with a different compiler and different build options than official builds (target=release_debug, production=yes, etc.). That can greatly impact the performance of the compiled code.

@wagnerfs
Copy link

Do wonder if this PR was abandoned prior to Godot 4.0, was honestly looking forward to it, specially since GL3 won't be a thing until Godot 4.1.
3.x would benefit TONS with caching alone, really hope this kicks along with 3.4 release

@Calinou
Copy link
Member

Calinou commented Aug 24, 2021

3.x would benefit TONS with caching alone, really hope this kicks along with 3.4 release

Since 3.4 is nearing release, I'm afraid it's too late to merge this for 3.4. There are still plans to finish this PR to get it in 3.5 hopefully, but I can't make any guarantees.

@RandomShaper
Copy link
Member Author

I'm looking forward to finish it, but lately I'm just not having enough time to work on it.

@jitspoe
Copy link
Contributor

jitspoe commented Aug 28, 2021

Dang, I was hoping this would be in the latest 3.x. That's one of the reasons I merged. I guess I confused it with the 4.x change. Might have to grab this manually. What's left to do? Merge it with the latest 3.x changes?

@RandomShaper
Copy link
Member Author

This is closed in favor of the new, much better #53411. Those who were interested in this, please check out the new one. I'm keeping this as an archived one for potential future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.