-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seg fault in System.Drawing.Common tests on Unix #23784
Comments
Tonight was the night I planned to launch an issue for this. It seems a constant source of outerloop failures. And causes me untold annoyance and investigation ... :) fix or disable I say ;) |
Unfortunately Dumpling isn't finding dumps on SUSE. @DrewScoggins have you had a chance ot take a look? |
@benaadams I do'nt think so, that's freebsd specific. |
@qmfrederik do you have any context on these segfaults? I'm not sure whether they're only on SUSE, but perhaps it has an outdated copy of libgdiplus (it doesn't return false from GetRecentGdiPlusIsAvailable()) /cc @safern |
Debian8.7 now. This should have everything needed to debug: |
@danmosemsft I don't have any immediate context for these segfaults. Debian Jessie ships with version 4.2 of libgdiplus which dates from December 2015. There have been a few fixes to libdiplus since. We'll probably need to do some work to identify all the areas where the corefx CI process identifies areas of libgdiplus which are not stable, and then decide how to deal with that; a couple of options include:
Meanwhile, I've downloaded the dumpling but not entirely sure what I'm looking at. Is there a way to extract the managed and native stack traces from those dumpling files? |
Is it possible to disable these tests on these platforms and tag them with this issue in the meantime then? It sounds like a fix isn't a quick thing and they cause a lot of outerloop failures. |
@Drawaes Part of the problem is that for tests that cause the test process to crash (e.g. segfaults), there is no clear indication in the test logs which unit test caused the segfault - so it's not easy to say which test should be disabled (other than disabling all System.Drawing.Common tests all together). |
Process of elimination then... you are on a binary search. Remove 1 test on 1 distro that fails a lot. See if it starts passing, Keep going until that distro either has no tests left or passes. Once it passes go back and try to add the other tests and see if it all works. :) At least you can narrow it hopefully, and secondly everyone else doesn't have to go into every outerloop because they always fail due to System.Drawing. |
Finally, we’ve got a dump, yay! @qmfrederik I sword of know how to open it, so I will try it in my Ubuntu box and update the thread. I think there is a doc if you want to try opening it in your end also. Let me find it an I’ll post it here. |
From the dumpling that @danmosemsft pointed out to I got this stack trace:
Which points us to the test that crashed and the path that it followed to cause the crash. Will go ahead and take a look at other dumps (if there are) in the other build failures to see if there is other crashing test, and then I'll disable the crashing tests. In the meantime, @qmfrederik would you mind taking a look at why this is crashing? Here are the docs on how to load the dump: https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#debugging-core-dumps-with-lldb Let me know if I can help you. |
Thanks that would be awesome |
PR to disable: dotnet/corefx#24581 In the meantime we can keep investigating and add more test to that PR if needed. |
Another crash, this one was in OSX. @DrewScoggins it failed to upload the dump, there was an exception thrown in Dumpling.py |
Yeah, this is known. I am going to try and look at this today, it's just been a little bit of a pain getting a working Mac. |
I can't see the failure anymore as the build was reseted. I will try to find the dump in dumpling website and investigate which test is the one that crashed. |
Seg fault on macOS this time:
|
Unfortunately OSX is not uploading the dumps correctly @DrewScoggins is working on fixing that, and we're still working on being able to crack open a dump in OSX :( |
We could add show progress which might suggest which test. |
That is true. I will go ahead and add it now. |
Another macOS seg fault here:
|
This was redhat. Maybe with this stack trace we are able to figure out what's happening, I would suspect all of them to be related to libgdiplus @qmfrederik looks like a libgdiplus issue. Would you mind helping to investigate? We have this stack trace from the test logs:
also the logs have the memory map: https://mc.dot.net/#/product/netcore/master/source/official~2Fcorefx~2Fmaster~2F/type/test~2Ffunctional~2Fcli~2F/build/20171120.02/workItem/System.Drawing.Common.Tests/wilogs And you can download the dump from: https://dumpling.azurewebsites.net/api/dumplings/archived/eb3711408ba3f7b1fd513e4589207c5d90f9f0d6 In order to debug the dump you can follow: https://github.com/dotnet/corefx/blob/master/Documentation/debugging/unix-instructions.md#debugging-core-dumps-with-lldb cc: @danmosemsft |
@safern Is there a way we can get the latest version of libgdiplus (as in, the git version) on the CI servers? @hughbe has been doing a lot of work on libgdiplus, so just upgrading may already improve the CI stability. |
@mmitche could you help me update the installed version of libgdiplus in CI and Helix machines? |
@safern This is just a build time dependency that needs updating? |
@qmfrederik is right. The crash looks like an ancient version of libgdiplus. The Mono CI machines run on version 6.0.4 right now. |
Just to make sure, are you commenting on the my last post that lists a macOS crash or the post before which is about Linux-Release-SLES?
YES, that would save us and the infra team a lot of trouble. |
Yes, that would be good. We could check for a minimum installed version. Do you mind submitting a PR for that?
cc @richlander for the libgdiplus documentation question. |
The Linux-Release-SLES is ancient libgdiplus for sure. Haven't looked at the macOS crash yet. |
Let me give that a try. I'll also check whether the libgdiplus-from-NuGet approach still works. May take a couple of days. Ping me if I haven't got round to doing it by next week 😄. If someone else can do it faster, feel free to do so 😄 . |
@jkotas unsure if this issue was only tracking rhel6 failures. |
Keeping it open. There are still tests disabled against this issue. |
Failed again on macOS. PR: #34263
|
Since #64084 got merged, |
Thanks @teo-tsirpanis . I wonder if there are any other active drawing issues that are unix only. |
There are a ton. I need to go through them and close them. However I'm waiting to do so, because some might use an active issue attribute on a test referencing the issue, so I'd like to clean that up as well. |
All Unix-related |
We've been plagued by seg faults on Linux in the System.Drawing.Common tests, e.g. this one on SUSE:
https://mc.dot.net/#/user/stephentoub/pr~2Fjenkins~2Fdotnet~2Fcorefx~2Fmaster~2F/test~2Ffunctional~2Fcli~2F/91ec984d64908b3ab312bef6f6fa599f5ea1cee7/workItem/System.Drawing.Common.Tests/wilogs
This has been happening frequently for months, but I can't find an existing issue for it.
The text was updated successfully, but these errors were encountered: