GPU particles expensive when ACTIVE = false #92764

WrobotGames · 2024-06-04T16:01:59Z

Tested versions

Tested in 4.3 beta 1

System information

Godot v4.3.beta1 - Windows 10.0.22631 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1060 6GB (NVIDIA; 32.0.15.5599) - Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (8 Threads)

Issue description

When a particle is set to inactive in the particle shader, it isn't rendered visible on screen, but it still has an impact on the rendering performance. Particle systems with high poly meshes and with all particles being inactive still increase the 'Render Depth Prepass' and 'Render Opaque Pass' a lot. On such a particle system, adding discard; to the start op the fragment shader improves performance, suggesting that the fragment shader is running. Another indication is that changing the amount of polygons in the mesh has an impact on performance, for example lowering the 'rings' count on the TorusMesh increases the performance.

To me it seems like inactive particles shouldn't have a rendering cost (or at least really low), as they aren't being rendered.

(Issue is related to #19507 and #92599)

Steps to reproduce

Add GPUParticles3D
Set the amount to something fairly low, like 100.
Add a custom particles shader with in start(): "ACTIVE = false;"
Set the mesh to a low poly mesh, like the cube.
Note the really high fps and low primitives count.
Set the mesh to a really high poly mesh, like a torus with 1000 ring segments.
Note how the fps is a lot lower and the primitives count is a lot higher, even though all particles are disabled in the shader.

Minimal reproduction project (MRP)

Very simple scene with fps counter and vsync disabled. The ACTIVE = false; is commented out by default.
Warning the scene has 12 million primitives. Open in 4.3 beta 1.
Gpu_particles.zip

AThousandShips · 2024-06-04T16:10:38Z

Since this value can only be known by running the shader, and all the data needed has to be fetched, I don't think this is a bug

With any reasonable use of ACTIVE it has to be checked and will depend on some input data, no normal particle shader would just always have ACTIVE = false so I'd say in normal use this isn't an issue

Much like a shader that uses discard as the fragment stage function

I don't think the shader compiler should "optimize" this code to work better as it's not a valid shader really, what would be the use case?

WrobotGames · 2024-06-04T16:21:36Z

I used the ACTIVE built-in to cull certain particles for a grass system like the one devmar on youtube made. Maybe this isn't the right use case for particles, and I should change to multimeshes instead. Anyway, while I was testing this I noticed the performance was fairly low (because I though I was 'culling' the particles). If its not possible to stop entire particles from being rendered like this, we shouldn't spend more time on this. (But I think the docs need to say what making an particle inactive actually does.)

AThousandShips · 2024-06-04T16:26:05Z

Then that use case is far more relevant, but your real world example shows why it takes performance:

You'd need to figure out if they should be active or not, which requires data to be fed to the particles

So I'd say that you should instead use other means to accomplish that if you need a lot of culling

Now the specifics with the rendering stuff might be a bug but I'm not sure that the statistics you're seeing are from after the active check is done, but might be when preparing the particle data to feed to them

So that would need investigation, but it might just be that the processing you see is the steps prior to active is checked and dropped

What is the performance difference between having the active or not? Is it just marginal or is it significant? Because if it's significant I suspect it's the earlier stages (I can't test your MRP at the moment but some statistics from your testing would be helpful

WrobotGames · 2024-06-04T17:49:28Z

Some performance numbers (during runtime) scene is MRP. Hipoly is torus with 1024 rings.
This test is about the numbers relative to each-other, the individual numbers don't mean a lot.

GPUParticles node not visible: ~4000fps ~0.25ms
GPUParticles node visible, not emitting: ~4000fps ~0.25ms
cube, not culling: ~1500fps, ~0.66ms
cube, culling: ~4000fps, ~0.25ms
Hipoly, not culling: ~188fps, ~5.32ms
Hipoly, culling: ~330fps, ~3.03ms
Hipoly unshaded, not culling: ~270fps, ~3.70ms
Hipoly unshaded, culling: ~330fps, ~3.03ms
Hipoly, fragment discard, not culling: ~354fps, ~2.82ms
Hipoly, fragment discard, culling: ~400fps, ~2.50ms

Observations

Culled low poly particles have no measurable cost compared to disabling the node (for 100 particles).
There is no performance difference between culled unshaded and culled pixel shaded.
Adding discard at the start of the fragment shader increases performance even with culling.
Just using discard in the fragment shader offers better performance than just culling.
Using discard in the fragment shader and using culling offers the best performance.

I think point 4 is really interesting, why does adding discard to just the fragment shader result in better performance than disabling the particles?

AThousandShips · 2024-06-04T18:52:47Z

Because discard is an internal feature in shaders which has special behavior I think, just setting ACTIVE is not a native feature

AThousandShips added discussion needs testing topic:particles performance labels Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU particles expensive when ACTIVE = false #92764

GPU particles expensive when ACTIVE = false #92764

WrobotGames commented Jun 4, 2024 •

edited

Loading

AThousandShips commented Jun 4, 2024 •

edited

Loading

WrobotGames commented Jun 4, 2024

AThousandShips commented Jun 4, 2024

WrobotGames commented Jun 4, 2024

AThousandShips commented Jun 4, 2024

GPU particles expensive when ACTIVE = false #92764

GPU particles expensive when ACTIVE = false #92764

Comments

WrobotGames commented Jun 4, 2024 • edited Loading

Tested versions

System information

Issue description

Steps to reproduce

Minimal reproduction project (MRP)

AThousandShips commented Jun 4, 2024 • edited Loading

WrobotGames commented Jun 4, 2024

AThousandShips commented Jun 4, 2024

WrobotGames commented Jun 4, 2024

AThousandShips commented Jun 4, 2024

WrobotGames commented Jun 4, 2024 •

edited

Loading

AThousandShips commented Jun 4, 2024 •

edited

Loading