-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SafeHandle use-after-dispose in FileSystemWatcher on OSX #30056
Comments
cc: @MaximLipnin |
@stephentoub do you know what could cause a NRE in a native call into macOS? Is the interop layer translating some kind of Then I see we get a COR_E_EXECUTIONENGINE while trying to deal with the exception. |
Not sure; @jkoritzinsky, @AaronRobinsonMSFT, ideas? |
@danmosemsft and @stephentoub The |
@AaronRobinsonMSFT, I only saw it that once. This is the P/Invoke: |
For future reference, it seems console output cannot directly be queried in Kusto. The best we can easily do is look for 134
ExitCode QueueName count_
|
@danmosemsft Do we know what version of coreclr is being used to run the tests themselves? Is it possible the coreclr being used in this instance is older than the date at which dotnet/coreclr#24847 was checked in? |
We'd have to dig to see which exact version, but this was 5 days ago, and corefx ingested a new coreclr 5 days ago, 6 days ago, and 7 days ago, so it definitely has that change from end of May. |
@mjanecke can you tell from the output at https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F38953~2Fmerge/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190626.70/workItem/System.IO.FileSystem.Watcher.Tests/wilogs whether a dump got uploaded? I don't see the usual dumpling song and dance. |
@danmosemsft, for that test, it looks like the failure happened on a Mac, which I don't think @epananth has finished implementing for dump collection yet. |
@mjanecke good point, and last I heard we did not have a way to create a dump on Mac (that could be debugged with managed debuggers). @mikem8361 any update on that? Is our only option to catch this under an attached debugger? (It's quite possible this isn't Mac specific actually but I'm interested anyway) |
The best way would be to catch this under a managed debugger. There is no “createdump” facility on macOS. You can setup your macOS to generate system core dumps (ulimit -c unlimited, etc.) but there a limitation in lldb that makes SOS on coredumps on macOS difficult. See https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md#launch-lldb-under-macos.
mikem
|
Since this is clearly mac specific we can confidently say it has only happened 1 time in 10 days. |
No more hits in the last week. Query:
|
Seems to be infrastructure related. Moving to 5.0. |
Failed in dotnet/corefx#42512:
|
I will take a look. We still have few flaky watcher tests on OSX and my plan was to run whole set in loop and chase them down. |
Failed again in #136. Same stack. Here's a dump: https://helix.dot.net/api/2019-06-17/jobs/b40ed2a0-5a9f-4d81-9bb5-62900426af75/workitems/System.IO.FileSystem.Watcher.Tests/files/core.18074 (it's 3.4 GB! get it while its hot / still available) |
The core file is incomplete or corrupted:
We should never limit core size as that makes them unusable. Also it is full of zeros and compressed size is only ~ 400M. We should not dump them raw to storage if we care about size. Should we open new issue to make core dumps on OSX usable @ericstj ? And if so I'm not what is correct repo and team to work on this. |
cc: @epananth for core file part. |
I'm guessing https://github.com/dotnet/arcade and tag @mjanecke @alexperovich @ChadNedzlek |
The clrstack error message means that SOS can’t find the right version of the DAC. Try enabling download support with “setsymbolserver -ms” and SOS will attempt to download the correct version. There still may be problems with the dump and there still is the swift lldb bug with coredump thread ids (see https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md#launch-lldb-under-macos) for more information.
mikem
From: Tomas Weinfurt <notifications@github.com>
Sent: Wednesday, November 20, 2019 11:57 AM
To: dotnet/corefx <corefx@noreply.github.com>
Cc: Mike McLaughlin <mikem@microsoft.com>; Mention <mention@noreply.github.com>
Subject: Re: [dotnet/corefx] Unhandled NullReferenceException on macOS from FileSystemWatcher (#38966)
The core file is incomplete or corrupted:
(lldb) target create --core "/Users/furt/Downloads/core.18074"
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3055 LC_SEGMENT_64 has a fileoff + filesize (0xda14f000) that extends beyond the end of the file (0xda14e000), the segment will be truncated to match
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3056 LC_SEGMENT_64 has a fileoff (0xda14f000) that extends beyond the end of the file (0xda14e000), ignoring this section
Core file '/Users/furt/Downloads/core.18074' (x86_64) was loaded.
(lldb) plugin load /Users/furt/github/diagnostics/artifacts/bin/OSX.x64.Debug/libsosplugin.dylib
(lldb) clrstack
Failed to load data access module, 0x80004005
Can not load or initialize libmscordaccore.dylib. The target runtime may not be initialized.
ClrStack failed
We should never limit core size as that makes them unusable. Also it is full of zeros and compressed size is only ~ 400M. We should not dump them raw to storage if we care about size. Should we open new issue to make core dumps on OSX usable @ericstj<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fericstj&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046674512&sdata=IcmY6MYuoioMkZ2AV23AjSKwGhC7xilyYVvIAMROb2k%3D&reserved=0> ? And if so I'm not what is correct repo and team to work on this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F38966%3Femail_source%3Dnotifications%26email_token%3DACPYK22FC5YXARV7BNZRDN3QUWI7VA5CNFSM4H3WOBPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEULIOQ%23issuecomment-556315706&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046684509&sdata=ZB6bXCVnZMW9GNoWvxZefDG%2FlImrIvfo%2FBb9skLB8U0%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACPYK26PPOSJ4WJ7H2KS4HDQUWI7VANCNFSM4H3WOBPA&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046684509&sdata=qJsX4rmbe0WtXxtMqU4e6MVRpZ%2BWHZCBQz%2FR27SXRHs%3D&reserved=0>.
|
@jkotas, hi, I met the same issue as you posted on macOS (on Windows it works fine..). Is it being fixed? Thanks~ |
no. If you have repro, please share it @xieofxie . |
@wfurt It is generated by a validation in azure devops for the pr above.. I don't know how to reproduce it steadily.. |
Unsure if this is the same issue. I see a NullReferenceException in
Configuration: |
@ViktorHofer not sure that's the same thing, but I don't know why it could be either -- there is no obvious way that code can ever NRE: runtime/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs Line 354 in e7b743c
I see it segfaulted though -- is that what Mono test runs do when there's an unhandled exceptoin that brings down the process (like in this case) or a separate segfault? |
Oh, I see @jkotas you believe this is a dupe? |
I think so. |
@jkotas - I see the FileWatcher tests failing on OSX in this PR: https://github.com/dotnet/runtime/pull/52252/checks?check_run_id=2502547808
Is that this same issue? I hope we got a dump of the process so we can figure the issue out. |
Yes, it is the same issue. This assert says that the file watcher callback can be called even after the EventStream was stopped using |
|
@jaredpar - for my education, how do you find that link? |
Clicked on the checks tab and went to the AzDO build They key part there is the build id is 1120772. From there just ask runfo to dump the data https://runfo.azurewebsites.net/view/build/?number=1120772 It will list all of the failed tests and for the FileWatcherTests there was a heap dump noted and linked to The link I posted is that one. |
Truthfully I was a bit lazier than that. I didn't click on the checks tab. Really I just went here https://runfo.azurewebsites.net/view/pr-builds/?Repository=runtime&Number=52252 Basically that dumps all of the builds that happened for the PR #52252. I happened to know that test ran on the That is the link I posted above . |
From PR to that display is seven clicks, runfo is two. And one of those clicks is a spinning wheel click waiting for that display to render :) Joking aside I'm glad that display is getting better but it does take a lot of clicks to get to. It also takes time to render and is not 100% reliable. Found several times over the years where it won't display anything and errors out. |
Another attempt to fix this #52275 |
OSX.1013.Amd64.Open-x64:Debug
https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F38953~2Fmerge/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190626.70/workItem/System.IO.FileSystem.Watcher.Tests/wilogs
cc: @JeremyKuhne, @carlossanlop
Runfo Tracking Issue: system.io.filesystem.watcher.tests work item
Build Result Summary
The text was updated successfully, but these errors were encountered: