Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on exit in SceneShaderForwardClustered destructor #77430

Closed
akien-mga opened this issue May 24, 2023 · 7 comments · Fixed by #77750
Closed

Crash on exit in SceneShaderForwardClustered destructor #77430

akien-mga opened this issue May 24, 2023 · 7 comments · Fixed by #77750

Comments

@akien-mga
Copy link
Member

Godot version

4.1.dev (094e884)

System information

Mageia 9, Linux

Issue description

I'm seeing a crash on exit in one of my test projects, which seems to be a recent regression.

The crash happens when quitting the 3D gameplay in https://github.com/team-godog/NGJ2023_Godog

There are errors about leaking GodotBody3D and GodotShape3D, and then leaking two shaders:

ERROR: 5 RID allocations of type 'P11GodotBody3D' were leaked at exit.
ERROR: 4 RID allocations of type 'P12GodotShape3D' were leaked at exit.
ERROR: Pages in use exist at exit in PagedAllocator: N33RendererSceneRenderImplementation22RenderForwardClustered32GeometryInstanceSurfaceDataCacheE
   at: ~PagedAllocator (./core/templates/paged_allocator.h:170)
ERROR: Pages in use exist at exit in PagedAllocator: N33RendererSceneRenderImplementation22RenderForwardClustered32GeometryInstanceForwardClusteredE
   at: ~PagedAllocator (./core/templates/paged_allocator.h:170)
ERROR: 2 shaders of type SceneForwardClusteredShaderRD were never freed
   at: ~ShaderRD (servers/rendering/renderer_rd/shader_rd.cpp:708)
ERROR: FATAL: DEV_ASSERT failed  "_first == nullptr" is false.
   at: ~List (./core/templates/self_list.h:114)

Thread 1 "godot-git" received signal SIGILL, Illegal instruction.
0x0000000009c2e943 in SelfList<RendererSceneRenderImplementation::SceneShaderForwardClustered::ShaderData>::List::~List (this=0xd048958, __in_chrg=<optimized out>) at ./core/templates/self_list.h:114
114                             DEV_ASSERT(_first == nullptr);

(gdb) bt
#0  0x0000000009c2e943 in SelfList<RendererSceneRenderImplementation::SceneShaderForwardClustered::ShaderData>::List::~List (this=0xd048958, __in_chrg=<optimized out>) at ./core/templates/self_list.h:114
#1  0x0000000009c29747 in RendererSceneRenderImplementation::SceneShaderForwardClustered::~SceneShaderForwardClustered (this=0xd048958, __in_chrg=<optimized out>)
    at servers/rendering/renderer_rd/forward_clustered/scene_shader_forward_clustered.cpp:473
#2  0x0000000009c0fc10 in RendererSceneRenderImplementation::RenderForwardClustered::~RenderForwardClustered (this=0xd046d40, __in_chrg=<optimized out>)
    at servers/rendering/renderer_rd/forward_clustered/render_forward_clustered.cpp:4026
#3  0x0000000009b60da2 in memdelete<RendererSceneRenderRD> (p_class=0xd046d40) at ./core/os/memory.h:109
#4  0x0000000009b5f396 in RendererCompositorRD::finalize (this=0xcc78be0) at servers/rendering/renderer_rd/renderer_compositor_rd.cpp:149
#5  0x0000000009a64601 in RenderingServerDefault::_finish (this=0xcc368b0) at servers/rendering/rendering_server_default.cpp:217
#6  0x0000000009a64839 in RenderingServerDefault::finish (this=0xcc368b0) at servers/rendering/rendering_server_default.cpp:244
#7  0x000000000584fee4 in finalize_display () at main/main.cpp:313
#8  0x0000000005868d56 in Main::cleanup (p_force=false) at main/main.cpp:3597
#9  0x00000000057dae8a in main (argc=1, argv=0x7fffffffd688) at platform/linuxbsd/godot_linuxbsd.cpp:75

The crash is on a DEV_ASSERT (so dev_build=yes), might not happen in a non-dev build.

Doesn't happen in 4.0.3-stable, and I don't remember seeing it a couple of weeks ago, so I expect this is a recent regression.

Steps to reproduce

  • Clone https://github.com/team-godog/NGJ2023_Godog
  • Edit it with latest Godot master, game commit 212feb11867bb4468b357e96756fef40d904e614
  • Press "Play", watch the intro (or skip with Esc), wait for the gameplay to start (takes a while to load in a dev build)
  • When in game (sliding dog on a slope), press Esc to show the pause menu, and press Quit - it should crash

Minimal reproduction project

See Steps to reproduce. It's not really minimal, I might try to narrow it down further but this is a relatively simple open source project and I think the regression can be easily pinpointed to recent changes to shader freeing.

CC @BastiaanOlij @clayjohn

@AThousandShips
Copy link
Member

Regression from #77341

@akien-mga
Copy link
Member Author

Regression from #77341

Right, the crash is caused by the change from a simple error to a DEV_ASSERT.

But what causes this assert doesn't seem to be a recent regression in the end, here's the log output from 4.1 dev 1 (seems similar in 4.1 dev 2):

ERROR: 5 RID allocations of type 'P11GodotBody3D' were leaked at exit.
ERROR: 4 RID allocations of type 'P12GodotShape3D' were leaked at exit.
ERROR: Pages in use exist at exit in PagedAllocator: N33RendererSceneRenderImplementation22RenderForwardClustered32GeometryInstanceSurfaceDataCacheE
   at: ~PagedAllocator (./core/templates/paged_allocator.h:140)
ERROR: Pages in use exist at exit in PagedAllocator: N33RendererSceneRenderImplementation22RenderForwardClustered32GeometryInstanceForwardClusteredE
   at: ~PagedAllocator (./core/templates/paged_allocator.h:140)
ERROR: 2 shaders of type SceneForwardClusteredShaderRD were never freed
   at: ~ShaderRD (servers/rendering/renderer_rd/shader_rd.cpp:708)
ERROR: Condition "_first != nullptr" is true.
   at: ~List (./core/templates/self_list.h:106)
ERROR: 10 RID allocations of type 'N10RendererRD11MeshStorage4MeshE' were leaked at exit.
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
ERROR: Condition "_first != nullptr" is true.
   at: ~List (./core/templates/self_list.h:106)
ERROR: 7 RID allocations of type 'N10RendererRD15MaterialStorage8MaterialE' were leaked at exit.
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
WARNING: Leaked instance dependency: Bug - did not call instance_notify_deleted when freeing.
     at: ~Dependency (servers/rendering/storage/utilities.cpp:56)
ERROR: 2 RID allocations of type 'N10RendererRD15MaterialStorage6ShaderE' were leaked at exit.
ERROR: 4 RID allocations of type 'N10RendererRD14TextureStorage7TextureE' were leaked at exit.
ERROR: 13 RID allocations of type 'N17RendererSceneCull8InstanceE' were leaked at exit.
WARNING: 7 RIDs of type "UniformBuffer" were leaked.
     at: _free_rids (drivers/vulkan/rendering_device_vulkan.cpp:8970)
WARNING: 30 RIDs of type "IndexArray" were leaked.
     at: _free_rids (drivers/vulkan/rendering_device_vulkan.cpp:8970)
WARNING: 30 RIDs of type "IndexBuffer" were leaked.
     at: _free_rids (drivers/vulkan/rendering_device_vulkan.cpp:8970)
WARNING: 4 RIDs of type "VertexArray" were leaked.
     at: _free_rids (drivers/vulkan/rendering_device_vulkan.cpp:8970)
WARNING: 16 RIDs of type "VertexBuffer" were leaked.
     at: _free_rids (drivers/vulkan/rendering_device_vulkan.cpp:8970)
WARNING: 8 RIDs of type "Texture" were leaked.
     at: finalize (drivers/vulkan/rendering_device_vulkan.cpp:9300)
WARNING: ObjectDB instances leaked at exit (run with --verbose for details).
     at: cleanup (core/object/object.cpp:1993)
ERROR: Resources still in use at exit (run with --verbose for details).
   at: clear (core/io/resource.cpp:489)

While 4.0.3-stable doesn't seem to report any leaks.

@clayjohn
Copy link
Member

clayjohn commented May 24, 2023

I don't think this is related to Bastiaan's recent PRs, the first objects shown as leaking are physics objects. Bastiaan's changes only impacted freeing specific shaders in the RD renderer. His material-dependency PR did not change how materials are freed.

I'll do a quick bisect and see if I can spot the culprit

I have checked as far back as db13026 and the leaks are still happening

@RedworkDE
Copy link
Member

RedworkDE commented May 30, 2023

ab5fc22 is the first bad commit

Not sure how that causes these issues tho.

Edit: it leaks nodes all over the place in this project, including some meshes, so the shaders never become unused: leaks.txt and the commit before does not: leaks_pre.txt (There are a few extra errors because of switftshader, but otherwise it is clean)

@RedworkDE
Copy link
Member

Minimized project: Test_77430.zip

The problem is that in Level.tscn the Level2 node (which is an instantiation of Leval.tsnc) has a child with the name BackgroundModels, which also exists in Level.tscn at the same level, overwriting the one from the packed scene causing it to leak, and as the packed node has meshes as children they and their resources in rendering server get leaked as well, ultimately causing this crash.

@RandomShaper
Copy link
Member

I have to check deeper, but this may be one of those cases that the switch from ERR_FAIL_COND to DEV_ASSERT has helped discovering an issue that was already there.

@akien-mga
Copy link
Member Author

I have to check deeper, but this may be one of those cases that the switch from ERR_FAIL_COND to DEV_ASSERT has helped discovering an issue that was already there.

Indeed, the crash was highlighted by the change to DEV_ASSERT. The actual issue it highlighted was caused by Juan's #75627 it seems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants