-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.6.0 freezes then crashes while rotating model on one system, works fine on another #10968
Comments
I also have this issue on 2.6.0 both snap and gtk3 app image. I am using following version of ubuntu: Linux shed 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux I also running 3rd gen cpu - Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz Can confirm that the MESA_LOADER_DRIVER_OVERRIDE=i965 also fixes my issue. |
That's useful information, thanks! So, it's not specific to this exact CPU, or to just one kernel version (I'm running Debian 6.1.0-10-amd64), but may be specific to the CPU/GPU family. I've started doing bisection builds: 2.5.2 - bug not present I'm building 2.6.0 alpha0 overnight, and will fine things down further from there tomorrow. |
2.6.0 alpha0 - bug not present |
OK, I've managed to narrow down the problem somewhat. The problem was definitely introduced between 2.6.0 alpha0 and 2.6.0 alpha1. It appears to be triggered by some code (there were a couple of versions) which tries to make use of a more sophisticated shader program "dashed_thick_lines" when drawing certain on-screen artifacts. The instance which triggers the problem is in GLGizmoRotate::on_render() where it's rendering the "how much has the model been rotated" indicator arc. The code checks to see if the GL manager believes that the OpenGL implementation supports the core profile. If so, the "dashed_thick_lines" shader is selected. If not (or if core-profile support is disabled at compile time) a simpler "flat" shader is selected.
There are several other places in the code where this shader is used - for example, in Selection, to draw the "selection box corners" around a selected object. I haven't ever seen a crash in this code. In Selection, the lines being drawn are short and straight. I'd guess that something is going wrong when (1) "dashed_thick_lines" shader is being used, (2) on an old Intel HD GPU, (3) by the crocus DRI engine, (4) to draw a curved arc. I have no good idea whether the shader program is defective (or inefficient) in some way, or whether the crocus engine is at fault in this case.
I haven't seen any crashes at all with the i915 renderer, with Mesa software rendering, or with the AMDGPU renderer used by the HD 2100 card I ebay'ed a few days ago ( a great cheap upgrade for this 10-year-old PC). So, it seems likely it's a crocus problem, triggered by this one particular shader program, and only in certain instances (e.g. arcs, or arcs with a large number of points?). Possible workarounds:
There are some comments elsewhere in the code that the "dashed_thick_lines" is too slow to use on Macs... but the 2.6.0 code changes have put it into use in many areas in the code. Maybe this is a decision which should be reconsidered? |
The plot thickens. Looking through the source code for the crocus userland driver code, I found only one abort() call. It occurs if the attempt to send a command to the GPU returns an error. Unfortunately, the code which would print out the specific error code in question is behind an #if DEBUG clause, and is thus disabled in release builds. On a hunch I did a "journalctl --system | grep i915" and the results are very interesting. On each run of the program I see something like: Jul 16 17:16:44 worker kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85ff9ff8, in slic3r_main [1442467] The GPU seems to freeze up three times (consistently) on each program run, when I try to click and rotate the model. After the third time, it looks as if the kernel driver rejects the next command, and the crocus code aborts. So, it looks as if something's amiss with the arc-drawing GPU operation and this triggers a timeout and GPU reset. Maybe drawing a many-vertexed arc with the dashed_thick_lines shader is just so slow that it times out? Maybe the fact that this shader program creates a whole bunch of additional vertices on the fly is causing the GPU to barf? There are a bunch of interesting INTEL_DEBUG outputs in crocus which can be enabled by an environment variable. Dumping out the batches may give a sense for what's happening. |
Even more fascinating! Enabling INTEL_DEBUG=stall (forcing the crocus code to wait for each batch to complete, before starting to work on the next one) does not affect the problem. Enabling INTEL_DEBUG=no8 or no16 or no32 (restricting use of certain sorts of shader instructions) doesn't affect the problem. Enabling INTEL_DEBUG=nofc (no fast clear) greatly reduces the severity of the problem. There are still occasional GPU hangs when rotating the STL model on the build plate, but they are much less frequency (maybe once a minute, rather than on every attempt) and I haven't yet seem them result in a crash. Finally, enabling INTEL_DEBUG=bat (print out an interpretation of the batches) seems to make the problem go away completely. So, we have a Heisenbug on our hands :-) It's possible that the latter is a timing-related issue - perhaps the delay in doing the debug prints is avoiding a race condition of some sort in the GPU or driver. However, even if I direct the debug output to /dev/null to minimize the delay, the problem remains completely gone. Interestingly, INTEL_DEBUG=bat has an internal side effect. If you enable it, the crocus code disables the use of the device "shadow copy" feature. It feels to me as if this is likely the source of the Heisenbug effect. So, it now seems to me as if there's at least one, and perhaps two bugs in the crocus code which supports this particular GPU. Fast clears may not always be working right, and shadow copy may not be working right. I don't see evidence that there's any specific error in the PrusaSlicer code (either the CPU logic, or the specific shader program involved here). I'll try to pass this along to the Mesa project team. Opened https://gitlab.freedesktop.org/mesa/mesa/-/issues/9392 Best-available workarounds at this point are to force the use of the older Mesa driver for this GPU family, or disable Mesa hardware acceleration completely. Running with INTEL_DEBUG=nofc might also be acceptable, if you want crocus GPU acceleration which mostly works. |
not sure if it is related or not, but i'm experiencing a lot of system wide freezing when utilizing the prusa slicer 2.6.0 on windows as well. The system periodically (every few seconds) freezes so that the even the mouse cursor stops moving. I can reproduce this by opening a dialog to load parts to place and just scrolling down. Once the object is placed it will happen also constantly while just working with the app. Its quite bad, i have to open the app, then grinding my teeth try to slice and quickly close the app to be able to use my computer again. I'm on AMD 5950x with 3090 gpu, its the only app doing this for me, so i'm fairly confident that its not a system issue. |
Same on windows 10 |
same on my AMD system, sometimes the GPU freezes completely
|
This annoyed me so much I am getting a new computer for my workshop, when I say new I mean a 6th Gen Intel for $180 aud hopefully the problem goes away. I can confirm the issue does not occur at all on 6th gen intel, which is a huge relief, was so annoying. |
Description of the bug
I've run into a reproducible problem when attempting to run 2.6.0 on one of my Debian systems. The same 2.6.0 appimage works fine on another Debian system, and the Debian distro build of 2.5.0 works fine on both.
The affected system is a 4-core i5-3470, using the integrated IvyBridge video adapter and GPU. The GPU appears to be relevant to the issue.
Symptoms:
When attempting to free-rotate the model before slicing, the system freezes for a few seconds, the rotation arrows appear briefly and sporadically on the screen, and the program crashes with an "aborted" error and no further useful information.
During the freeze, the whole system appears to be frozen... and using the "intel-gpu-top" monitoring program seems to show that it was in 100% "render" state during the freeze.
Performing the same operation under the Debian disto build of 2.5.0 does not exhibit the problem. Neither does the 2.6.0 appimage, when running on my Debian "testing" laptop.
I cloned the github repo, and built the 2.6.0 version without difficulty. It exhibits the same crashing behavior. So, it's not apparently an AppImage-related issue.
I ran the version I built under gdb, reproduced the error, and generated a backtrace. The abort() is occurring within the "crocus" DRI library, called during a buffer swap by wxGLCanvasX11 during rendering.
Thread 1 "slic3r_main" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=, signo=signo@entry=6,
no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation
(threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0)
at ./nptl/pthread_kill.c:44
#1 0x00007ffff6b90d2f in __pthread_kill_internal
(signo=6, threadid=) at ./nptl/pthread_kill.c:78
#2 0x00007ffff6b41ef2 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/posix/raise.c:26
#3 0x00007ffff6b2c472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007fffda49e18f in () at /usr/lib/x86_64-linux-gnu/dri/crocus_dri.so
#5 0x00007fffdb41ae69 in () at /usr/lib/x86_64-linux-gnu/dri/crocus_dri.so
#6 0x00007fffda576946 in () at /usr/lib/x86_64-linux-gnu/dri/crocus_dri.so
#7 0x00007fffda4b2c2a in () at /usr/lib/x86_64-linux-gnu/dri/crocus_dri.so
#8 0x00007fffef64ec3e in glLabelObjectEXT ()
at /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#9 0x00007fffef640ab1 in () at /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#10 0x00007fffef63023b in () at /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#11 0x000055555759dbe0 in wxGLCanvasX11::SwapBuffers() ()
#12 0x0000555556892947 in Slic3r::GUI::GLCanvas3D::render() ()
#13 0x0000555556892d7b in Slic3r::GUI::GLCanvas3D::_refresh_if_shown_on_screen() ()
#14 0x000055555689393b in Slic3r::GUI::GLCanvas3D::on_mouse(wxMouseEvent&) ()
#15 0x0000555557afe692 in wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) ()
--Type for more, q to quit, c to continue without paging--
#16 0x0000555557afea37 in wxEvtHandler::SearchDynamicEventTable(wxEvent&) ()
#17 0x0000555557afeb90 in wxEvtHandler::TryHereOnly(wxEvent&) ()
#18 0x0000555557afec3a in wxEvtHandler::ProcessEventLocally(wxEvent&) ()
#19 0x0000555557afece1 in wxEvtHandler::ProcessEvent(wxEvent&) ()
#20 0x0000555557aff5a7 in wxEvtHandler::SafelyProcessEvent(wxEvent&) ()
#21 0x0000555557846107 in gtk_window_enter_callback ()
#22 0x00007ffff7290cb4 in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#23 0x00007ffff7aad5a9 in () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
#24 0x00007ffff7ac605e in g_signal_emit_valist ()
at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
#25 0x00007ffff7ac6dbf in g_signal_emit ()
at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
#26 0x00007ffff75697d4 in () at /lib/x86_64-linux-gnu/libgtk-3.so.0
#27 0x00007ffff7409286 in gtk_main_do_event ()
at /lib/x86_64-linux-gnu/libgtk-3.so.0
#28 0x00007ffff7b32815 in () at /lib/x86_64-linux-gnu/libgdk-3.so.0
#29 0x00007ffff7b8c702 in () at /lib/x86_64-linux-gnu/libgdk-3.so.0
#30 0x00007ffff6e1a7a9 in g_main_context_dispatch ()
at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#31 0x00007ffff6e1aa38 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#32 0x00007ffff6e1acef in g_main_loop_run ()
at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#33 0x00007ffff7408435 in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
--Type for more, q to quit, c to continue without paging--c
#34 0x000055555782a735 in wxGUIEventLoop::DoRun() ()
#35 0x00005555579b1b9d in wxEventLoopBase::Run() ()
#36 0x000055555795280d in wxAppConsoleBase::OnRun() ()
#37 0x0000555557a38964 in wxEntry(int&, wchar_t**) ()
#38 0x000055555640a077 in Slic3r::GUI::GUI_Run(Slic3r::GUI::GUI_InitParams&) ()
#39 0x0000555555a70e49 in Slic3r::CLI::run(int, char**) ()
#40 0x0000555555a346a3 in main ()
After reading up on the "crocus" DRI library and its current state, I tried re-running the test, setting MESA_LOADER_DRIVER_OVERRIDE=i965 in the environment (thus forcing the use of the legacy i965 DRI engine rather than crocus).
The problem went away. The freeze was gone, the rotation arrows and the rotation process operated smoothly, intel-gpu-top showed reasonable render/3D usage, and the program did not crash.
I'm going to hunch here: some change in the canvas-drawing code in 2.6.0 (or in wxGLCanvasX11) is issuing Mesa operations which is tickling a bug of some sort in the new crocus renderer. Maybe a pathological pattern of objects, maybe something which triggers memory or other resource exhaustion... ??
Project file & How to reproduce
2.6.0-rendering-crash-in-crocus.zip
To reproduce:
If the problem is not present, everything will work fine.
If the problem is present, you may notice some or all of the following:
If you do succeed in reproducing the problem, then try
MESA_LOADER_DRIVER_OVERRIDE=i965 /path/to/prusa-slicer
or
MESA_LOADER_DRIVER_OVERRIDE=i965 gdb /path/to/prusa-slicer
and observe that the problem does not occur with the DRI driver override in place.
Checklist of files included above
Version of PrusaSlicer
2.6.0 (using github tag)
Operating system
Linux, Debian "bookworm"
Printer model
Original Prusa i3 MK3S & MK3S+
The text was updated successfully, but these errors were encountered: