-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in Nvidia when using IMs #1082
Comments
Same problem here with Nvidia 378 driver.
|
@romainreignier Please try adding the following to your .bashrc and resourcing it. The solution below seems to work for pthread type issues, but didn't solve my problem which always originates from an Ogre render_() call.
|
@pbeeson Thanks for your quick answer. I will try that solution as soon as I reboot. But for now, I have switched to my Intel card. |
@pbeeson I have tried your solution but it seems it does not change anything:
|
same here. |
I can produce this with the teb local planner tutorial. I'm using 375.39. |
I don't get the traceback @romainreignier is talking about, but I do get the original one @pbeeson and @ompugao are talking about. |
Ok, I looked into this for a while and I wasn't able to find the problem easily. Here are some notes though:
I wasn't able to find any Ogre or Qt or nvidia threads about this, or none that I think are related. |
Thanks for spending some time on this. Those of us working on object based
manipulation use IMs a lot and have been dealing with this for some time.
One non-proven bit of info I can share is that by removing alpha-ed meshed
surrounding the current IM that is being examined, the probability of
crashes goes way down. So I see it much more in complex, crowded scenes
than in simple ones.
…On Mon, May 1, 2017 at 8:15 PM William Woodall ***@***.***> wrote:
Ok, I looked into this for a while and I wasn't able to find the problem
easily. Here are some notes though:
- You do not need to click with the publish point tool to trigger the
segfault
- You can just select the tool and move your mouse around in the
viewport
- You can remove everything except the get3DPoint() and it will still
segfault, but if you remove that call in the
PointTool::processMouseEvent() method it will not crash
-
https://github.com/ros-visualization/rviz/blob/7970ba08cee3810cfa1609c3b0f5136970eb2f7c/src/rviz/default_plugin/tools/point_tool.cpp#L98
- The segfault occurs in thread 1, which is the Qt main thread and
from the onUpdate() method, which ends up calling the Ogre render
function
- However, the code that causes the segfault is also executed in
this thread, therefore I do not think it is a race condition, but I could
be wrong
I wasn't able to find any Ogre or Qt or nvidia threads about this, or none
that I think are related.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1082 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADrfxpRMYvcxVw_Ye7XMS30qo0LvP9q8ks5r1nWwgaJpZM4MDeGi>
.
|
I got exactly same backtrace as @pbeeson. Each time I click on an interactive marker and want to render it again, the segfault error occurred. Here I use ubuntu 16.04 with Nvidia driver 375.26 and ROS kinetic. Interestingly if there are something behind the marker, then it is fine, I can click it and render it again as many time as I want.. It would be very grateful if someone can provide a solution |
I too get a similar backtrace as @pbeeson. Ubuntu 16.04.3 (default desktop), Nvidia 384.81, ROS Kinetic. I planned on using interactive markers and the publish points tool with a CUDA program so this is problematic. |
It is possible this was fixed by PR1167. We had been using a pretty old library for setting up IM controls that had quaternions of magnitude 2. I never realized this until the latest Rviz stopped displaying our IM controls. |
Update: This was not fixed by PR1167. I'm still getting this FREQUENTLY (like every minute when using IMs) using the latest 16.04.05. Kinetic 1.12.16, from debs AND when compiled locally.
|
I'm wondering if rviz::VisualizationManager::onUpdate() could handle SIGSEGV on Ogre::Root::renderOneFrame(), but I don't know enough about properly handling signals in multi-threaded environments. |
I’ve noticed that I can determinisitcally make this fail by having a marker array drawn and then touching an Interactive Marker right click pulldown menu, so I’m still not convinced this isn’t some memory (indexing) issue originating in Rviz. |
I can confirm that we're also seeing RViz segfault like pbeeson while using interactive markers in RViz. We are running kinetic, xenial, and nvidia drivers. |
I have discovered that this is easily reproducible when sending a marker
array of size 1000 then right clicking on any IM. I will try to post a
simple deterministic failure soon.
…On Tue, Nov 6, 2018 at 2:21 PM awatson3 ***@***.***> wrote:
I can confirm that we're also seeing RViz segfault like pbeeson while
using interactive markers in RViz. We are running kinetic, xenial, and
nvidia drivers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1082 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADrfxlQgSK5uQ7M6rzRW3Ugq9lPcM1qtks5use85gaJpZM4MDeGi>
.
|
I had this problem but I believe I was able to overcome it by installing OGRE v1.9.1 from source. Followed by recompiling Rviz from source. |
Hello guys !! |
@d-walsh Do you have an idea why installing OGRE and RviZ from source fixes this issue ? |
@GMahmoud I don't know the exact reason, but it could be because I'm using a newer version of OGRE (v1.9.1 vs v1.9.0) or perhaps it is built against certain Nvidia dependencies. |
Had the same issue trying to use the Publish Point tool. Installing OGRE v1.9.1 and recompiling RViz from source fixed the issue for me. Thanks @d-walsh |
Got the same problem on nvidia driven machines. |
We use the ogre from Ubuntu in ROS Kinetic/Melodic, so we cannot change the Ogre version, sorry. It sounds like building from source is the best bet. |
When I am installing ogre 1.9.1 release source, the libs still carry the 1.9.0 version number, is that expected? Also, for me that didn't fix the error.
For other ogre versions, the version numbers seem to be correct. |
I kind of looked into that and it looks like the authors didn't update the version number. Are you sure you're running the recompiled version and not the one that came with ROS? |
@jacobhuesman Yes, I am certain about that! (when I build rviz against any other version of ogre, cmake shows me the version number of the located package) |
I can confirm that this bug disappears with Ogre 1.9.1. Hence, closing this issue here. |
I am currently experiencing a similar issue where an segmentation fault occurs in Rviz on ROS-Melodic, however my computer is not equipped with a Nvidia card. The application also uses Interactive Markers. I've included the backtrace:
Is this also related to my Ogre version? |
At least the segfault occured not in the ogre library, but in plain qt. |
Thanks for the response, the used environment: To reproduce the segfault I just have to stay in the affected GUI window for 10+ seconds, when I then try to get focus or click something in Rviz it crashes. How long it takes differs but is generally speaking not longer than 1 minute. I will try to make a docker of the environment today/tomorrow in which I can reproduce the issue. |
Avoids two bugs in ROS Kinetic which are triggered by this code: - RViz crashes when using NVidia drivers, see ros-visualization/rviz#1082 (comment) - Context menu entries containing unknown characters, most likely font related (interactive markers use QChar(0x3000) and similar)
We've found that when the always_visible() flag is set on markers, and you have attached a mesh to the marker, that Ubuntu nvidia drivers will end up crashing rviz when you are using the controller to change the pose of the marker.. Backtrace of gdb provided below. This happens on 304--367 Nvidia drivers in Kinetic, but if nouveau is used we don't see it.
The text was updated successfully, but these errors were encountered: