Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility renderer outperforms Forward+ renderer (GPUParticles3D) #97903

Closed
TheYellowArchitect opened this issue Oct 6, 2024 · 3 comments

Comments

@TheYellowArchitect
Copy link

TheYellowArchitect commented Oct 6, 2024

Tested versions

  • Reproducible in 4.4 dev3, dev2 (and probably previous versions)

System information

Godot v4.4.dev3 - Artix Linux #1 SMP PREEMPT_DYNAMIC Wed, 02 Oct 2024 15:03:06 +0000 on Tty - X11 display driver, Multi-window, 1 monitor - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1050 Ti (nvidia; 560.35.03) - AMD Ryzen 5 2600 Six-Core Processor (12 threads)

Issue description

I was making a smoke grenade, so I went to use GPUParticles3D. I basically made quad meshes (backs are culled) with smoke textures, and with only 80 particles on-screen, the FPS tanks - a single GPUParticles3D!
So if a few smoke grenades are thrown, the game becomes unplayable.

Yet, this does not happen at the compatibility renderer. I made a MRP and benchmarked them with 3 trials each.

Single Smoke Grenade GPUParticles3D:

Renderer Forward+ Compatibility
Lowest FPS 63 201
Lowest FPS 73 206
Lowest FPS 74 218

2 Smoke Grenades GPUParticles3D:

Renderer Forward+ Compatibility
Lowest FPS 40 151
Lowest FPS 52 176
Lowest FPS 57 170

smoke-grenade-54-fps-forward-renderer

I think this issue is important as it addresses the "godot sucks at 3D, its performance is abysmal." complaint from users.

Steps to reproduce

Make any GPUParticles3D with quad mesh and make a practical effect, so it uses alpha/transparency.

Minimal reproduction project (MRP)

smoke-grenade-room.zip

Use arrow keys to zoom up (and WS if you want to move front/back)

@Calinou
Copy link
Member

Calinou commented Oct 7, 2024

This is expected, as the Compatibility rendering method has a lower base cost for simple scenes, and also a lower fill rate cost since its shaders are much simpler.

The use case shown in the screenshot is also a pathological use case for any renderer. Having this many large overlapping planes in front of the camera imposes an extreme fillrate cost on the GPU. Here, the renderer with the simplest shaders will always win. Also, remember to always use the Unshaded shading mode1 for large particles in front of the screen, and consider using more particles that are smaller individually to reduce fillrate requirements. Here, using more particles can actually be faster despite the higher polygon count. I'd try using 1,000 particles that are significantly smaller at first, possibly even more.

The Forward+ and Mobile rendering methods have much more advanced features, which generally implies a higher base cost (i.e. simple scenes are slower to render). However, in return, complex scenes with lots of geometry and draw calls are generally able to scale better to these rendering methods. Choose the right tool for the job 🙂

In Forward+, you may also get better performance from enabling volumetric fog and using FogVolume nodes to achieve smoke effects.

Footnotes

  1. If you need the particles to be shaded, use per-vertex shading which is significantly faster than per-pixel shading in this situation. It's not as fast as unshaded rendering though.

@TheYellowArchitect
Copy link
Author

TheYellowArchitect commented Oct 7, 2024

I thank you for the swift and detailed answer. I had no idea about the differences between the renderers, I thought Forward+ was objectively faster than compatibility. Nor did I think it was possible for more particles with less scale, to be more performant (I confirmed, it is true 👍 )

Also, remember to always use the Unshaded shading mode

Did a quick benchmark (same particle amount and scale as MRP)
2 Smoke Grenades GPUParticles3D:

Renderer Forward+ Compatibility
Lowest FPS 167 240 (max)

If you need the particles to be shaded, use per-vertex shading which is significantly faster than per-pixel shading in this situation.

I built that specific commit, here are the benchmarks (2 trials) for per-vertex shading, its a good improvement

2 Smoke Grenades GPUParticles3D:

Renderer Forward+ Compatibility
Lowest FPS 85 229
Lowest FPS 84 216
Lowest FPS 93 238

In Forward+, you may also get better performance from enabling volumetric fog and using FogVolume nodes to achieve smoke effects.

It probably isn't ideal for my game, as the camera is top-down. Also I want it to work like a real smoke grenade, where it pours smoke fast to fill a room, instead of the "packed smoke" approach of counter strike (and all other FPS which copy-paste its smoke grenade)
I will try it at some point and confirm.

For now, I will tweak the per-vertex smoke grenade in hopes I make it look as intended, with more particles and less overlaps.

The Forward+ and Mobile rendering methods have much more advanced features

Is there a list of them? Decals are the ones I have noticed, but what else? The only documentation I found on their differences is this: https://docs.godotengine.org/en/stable/contributing/development/core_and_modules/internal_rendering_architecture.html

Quoted from the above link:

As its name implies, the Forward+ backend uses clustered lighting. This allows using as many lights as you want; performance largely depends on screen coverage. Shadow-less lights can be almost free if they don't occupy much space on screen.

If I have a scene with a lot of omnilight3D or spotlight3D nodes, is Forward+ more performant? Because as you said, this MRP is a pathological use-case exclusively focusing on particles, without any lighting (except the directional light)

@Calinou
Copy link
Member

Calinou commented Oct 7, 2024

If I have a scene with a lot of omnilight3D or spotlight3D nodes, is Forward+ more performant?

Yes, since it uses a clustered approach to lighting, as opposed to the traditional multi-pass/single-pass approach used in Compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants