Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU particles expensive when ACTIVE = false #92764

Open
WrobotGames opened this issue Jun 4, 2024 · 5 comments
Open

GPU particles expensive when ACTIVE = false #92764

WrobotGames opened this issue Jun 4, 2024 · 5 comments

Comments

@WrobotGames
Copy link

WrobotGames commented Jun 4, 2024

Tested versions

  • Tested in 4.3 beta 1

System information

Godot v4.3.beta1 - Windows 10.0.22631 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1060 6GB (NVIDIA; 32.0.15.5599) - Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (8 Threads)

Issue description

When a particle is set to inactive in the particle shader, it isn't rendered visible on screen, but it still has an impact on the rendering performance. Particle systems with high poly meshes and with all particles being inactive still increase the 'Render Depth Prepass' and 'Render Opaque Pass' a lot. On such a particle system, adding discard; to the start op the fragment shader improves performance, suggesting that the fragment shader is running. Another indication is that changing the amount of polygons in the mesh has an impact on performance, for example lowering the 'rings' count on the TorusMesh increases the performance.

To me it seems like inactive particles shouldn't have a rendering cost (or at least really low), as they aren't being rendered.

(Issue is related to #19507 and #92599)

Steps to reproduce

  1. Add GPUParticles3D
  2. Set the amount to something fairly low, like 100.
  3. Add a custom particles shader with in start(): "ACTIVE = false;"
  4. Set the mesh to a low poly mesh, like the cube.
  5. Note the really high fps and low primitives count.
  6. Set the mesh to a really high poly mesh, like a torus with 1000 ring segments.
  7. Note how the fps is a lot lower and the primitives count is a lot higher, even though all particles are disabled in the shader.

Minimal reproduction project (MRP)

Very simple scene with fps counter and vsync disabled. The ACTIVE = false; is commented out by default.
Warning the scene has 12 million primitives. Open in 4.3 beta 1.
Gpu_particles.zip

@AThousandShips
Copy link
Member

AThousandShips commented Jun 4, 2024

Since this value can only be known by running the shader, and all the data needed has to be fetched, I don't think this is a bug

With any reasonable use of ACTIVE it has to be checked and will depend on some input data, no normal particle shader would just always have ACTIVE = false so I'd say in normal use this isn't an issue

Much like a shader that uses discard as the fragment stage function

I don't think the shader compiler should "optimize" this code to work better as it's not a valid shader really, what would be the use case?

@WrobotGames
Copy link
Author

I used the ACTIVE built-in to cull certain particles for a grass system like the one devmar on youtube made. Maybe this isn't the right use case for particles, and I should change to multimeshes instead. Anyway, while I was testing this I noticed the performance was fairly low (because I though I was 'culling' the particles). If its not possible to stop entire particles from being rendered like this, we shouldn't spend more time on this. (But I think the docs need to say what making an particle inactive actually does.)

@AThousandShips
Copy link
Member

Then that use case is far more relevant, but your real world example shows why it takes performance:

  • You'd need to figure out if they should be active or not, which requires data to be fed to the particles

So I'd say that you should instead use other means to accomplish that if you need a lot of culling

Now the specifics with the rendering stuff might be a bug but I'm not sure that the statistics you're seeing are from after the active check is done, but might be when preparing the particle data to feed to them

So that would need investigation, but it might just be that the processing you see is the steps prior to active is checked and dropped

What is the performance difference between having the active or not? Is it just marginal or is it significant? Because if it's significant I suspect it's the earlier stages (I can't test your MRP at the moment but some statistics from your testing would be helpful

@WrobotGames
Copy link
Author

Some performance numbers (during runtime) scene is MRP. Hipoly is torus with 1024 rings.
This test is about the numbers relative to each-other, the individual numbers don't mean a lot.

  • GPUParticles node not visible: ~4000fps ~0.25ms
  • GPUParticles node visible, not emitting: ~4000fps ~0.25ms
  • cube, not culling: ~1500fps, ~0.66ms
  • cube, culling: ~4000fps, ~0.25ms
  • Hipoly, not culling: ~188fps, ~5.32ms
  • Hipoly, culling: ~330fps, ~3.03ms
  • Hipoly unshaded, not culling: ~270fps, ~3.70ms
  • Hipoly unshaded, culling: ~330fps, ~3.03ms
  • Hipoly, fragment discard, not culling: ~354fps, ~2.82ms
  • Hipoly, fragment discard, culling: ~400fps, ~2.50ms

Observations

  1. Culled low poly particles have no measurable cost compared to disabling the node (for 100 particles).
  2. There is no performance difference between culled unshaded and culled pixel shaded.
  3. Adding discard at the start of the fragment shader increases performance even with culling.
  4. Just using discard in the fragment shader offers better performance than just culling.
  5. Using discard in the fragment shader and using culling offers the best performance.

I think point 4 is really interesting, why does adding discard to just the fragment shader result in better performance than disabling the particles?

@AThousandShips
Copy link
Member

Because discard is an internal feature in shaders which has special behavior I think, just setting ACTIVE is not a native feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants