Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Linux, failing to bind the address on Start generates a coredump #5920

Closed
tmds opened this issue May 9, 2018 · 33 comments
Closed

On Linux, failing to bind the address on Start generates a coredump #5920

tmds opened this issue May 9, 2018 · 33 comments
Labels
area-hosting Includes Hosting area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Milestone

Comments

@tmds
Copy link
Member

tmds commented May 9, 2018

When starting Kestrel on a port that is taken an exception is thrown. e.g.:

System.IO.IOException: Failed to bind to address https://127.0.0.1:5001: address already in use

Since this exception is not handled, the runtime calls abort which (normally) generates a coredump (see https://github.com/dotnet/coreclr/issues/17929).

The behavior seems excessive for failing to bind the port. Maybe there should be a try/catch somewhere to turn this into a non-success process exit?
It may not be obvious these coredumps are being generated. And when seeing these dumps on their system, users may get the wrong impression dotnet failed in some bad way.

CC @davidfowl @halter73 @mikem8361

@tmds
Copy link
Member Author

tmds commented May 9, 2018

I need to verify if generating the coredumps are default behavior on a Fedora system or whether this is something I configured myself (and forgot about).

@halter73
Copy link
Member

Interesting. Unfortunately, not throwing is a breaking change at this point.

@Tratcher
Copy link
Member

That try/catch would go in Program.Main, that's where the process lifetime is controlled.

@tmds
Copy link
Member Author

tmds commented May 14, 2018

Until last week, I wasn't aware of the coredumps (and I guess probably you weren't either).
The stack (Kestrel/Hosting/template/...) should handle this somewhere in a nicer way.
I think it's fine to improve this post 2.1.

On Fedora, coredumps are enabled by default. The OS also does cleanup to avoid filling the disk.

@tmds
Copy link
Member Author

tmds commented May 25, 2018

That try/catch would go in Program.Main, that's where the process lifetime is controlled.

That makes sense. Perhaps it's ok to catch any type of exception and return exitcode 1?
The change would be to update the templates to include a try-catch block in Main?

Can you set a milestone?

@Tratcher
Copy link
Member

@davidfowl @DamianEdwards

@DamianEdwards
Copy link
Member

I'd be interested to know what other stacks do here. Having specific logic like this in the framework or the template seems wrong due to the (non-destructive) behavior of a single distro. It might be surprsing, but I'm not sure it warrants guarding against.

@muratg
Copy link
Contributor

muratg commented Jun 14, 2018

Moving to discussions based on @DamianEdwards' last comment.

@tmds
Copy link
Member Author

tmds commented Jun 15, 2018

I didn't find time yet to look at other stacks.

due to the (non-destructive) behavior of a single distro

Many distros noop coredumps because their users don't know what to do with them, and they don' t support cleaning them up. That is the distro's choice.

Our choice is whether we end up calling abort or not.
abort is meant for abnormal process termination. This means: the program finds itself in a situation it doesn't know how to handle and terminates immediately. abort is documented to generate coredumps.

imo, we are not in such an abnormal case, the webserver can terminate normally with a non-zero exit code.

@mikem8361
Copy link
Member

mikem8361 commented Jun 15, 2018 via email

@tmds
Copy link
Member Author

tmds commented Jun 15, 2018

@mikem8361 I'm not sure how you read my comment.

For me, the runtime behaves properly by calling abort when it sees an unhandled exception:

the program finds itself in a situation it doesn't know how to handle and terminates immediately

By

Our choice is whether we end up calling abort or not.

I meant: it is our choice to catch the exception or let it go to the runtime.

@halter73
Copy link
Member

Where do you suggest we catch the exception? Do you think that WebHost.Run() should just log and exit gracefully if the server fails to start?

@tmds
Copy link
Member Author

tmds commented Jun 17, 2018

Where do you suggest we catch the exception? Do you think that WebHost.Run() should just log and exit gracefully if the server fails to start?

Yes, a method on WebHost (could be Run or a new one). It should return an int that is used as the return value of Main.

@tmds
Copy link
Member Author

tmds commented Jun 18, 2018

I'd be interested to know what other stacks do here.

When the server is already bound: ruby, node, python show a stacktrace for an unhandled exception and their runtimes call exit with a value of 1.

@tmds
Copy link
Member Author

tmds commented Jun 26, 2018

    public class Program
    {
        public static int Main(string[] args)
        {
            return CreateWebHostBuilder(args).Build().RunMain();
        }

        public static IWebHostBuilder CreateWebHostBuilder(string[] args) =>
            WebHost.CreateDefaultBuilder(args)
                .UseStartup<Startup>();
    }

    public static class WebHostExtensions
    {
        public static int RunMain(this IWebHost host)
        {
            return host.RunMainAsync().GetAwaiter().GetResult();
        }

        public static async Task<int> RunMainAsync(this IWebHost host, CancellationToken token = default)
        {
            try
            {
                await host.RunAsync(token);
                return 0;
            }
            catch
            {
                // Exception is logged by host.
                return 1;
            }
        }
    }

@halter73 this application doesn't terminate when the host throws a bind exception. backtrace looks like the main thread is waiting for the finalizer thread:

(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGSTOP
  * frame #0: 0x00007ffff79c152c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007ffff64214e2 libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 354
    frame #2: 0x00007ffff64210c4 libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 388
    frame #3: 0x00007ffff6425a94 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) + 1812
    frame #4: 0x00007ffff6425e11 libcoreclr.so`WaitForMultipleObjectsEx + 81
    frame #5: 0x00007ffff609197b libcoreclr.so`Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode) + 1467
    frame #6: 0x00007ffff608c710 libcoreclr.so`Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 80
    frame #7: 0x00007ffff6169d3f libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 95
    frame #8: 0x00007ffff60fb186 libcoreclr.so`FinalizerThread::FinalizerThreadWatchDogHelper() + 742
    frame #9: 0x00007ffff60fac9f libcoreclr.so`FinalizerThread::FinalizerThreadWatchDog() + 271
    frame #10: 0x00007ffff60c733f libcoreclr.so`EEShutDownHelper(int) + 783
    frame #11: 0x00007ffff60c7bbd libcoreclr.so`EEShutDown(int) + 205
    frame #12: 0x00007ffff6011460 libcoreclr.so`CorHost2::UnloadAppDomain2(unsigned int, int, int*) + 96
    frame #13: 0x00007ffff5fe9d91 libcoreclr.so`coreclr_shutdown_2 + 33
    frame #14: 0x00007ffff67c17f7 libhostpolicy.so`___lldb_unnamed_symbol1549$$libhostpolicy.so + 103
    frame #15: 0x00007ffff67b4370 libhostpolicy.so`___lldb_unnamed_symbol1269$$libhostpolicy.so + 6896
    frame #16: 0x00007ffff67b4b4c libhostpolicy.so`corehost_main + 236
    frame #17: 0x00007ffff6a67cbf libhostfxr.so`___lldb_unnamed_symbol1564$$libhostfxr.so + 207
    frame #18: 0x00007ffff6a7336c libhostfxr.so`___lldb_unnamed_symbol1609$$libhostfxr.so + 5436
    frame #19: 0x00007ffff6a744c5 libhostfxr.so`___lldb_unnamed_symbol1612$$libhostfxr.so + 437
    frame #20: 0x00007ffff6a73a39 libhostfxr.so`___lldb_unnamed_symbol1610$$libhostfxr.so + 1001
    frame #21: 0x00007ffff6a67f0c libhostfxr.so`hostfxr_main_startupinfo + 156
    frame #22: 0x000000000040ac74 dotnet`___lldb_unnamed_symbol30$$dotnet + 1572
    frame #23: 0x000000000040af05 dotnet`___lldb_unnamed_symbol31$$dotnet + 165
    frame #24: 0x00007ffff6cda18b libc.so.6`__libc_start_main + 235
    frame #25: 0x0000000000408a54 dotnet`_start + 41

@jkotas
Copy link
Member

jkotas commented Jun 26, 2018

What is the finalizer thread doing when this happens? I expect that it is raising ProcessExit event and the event handler is stuck.

@tmds
Copy link
Member Author

tmds commented Jun 26, 2018

@jkotas you are right:

  thread #23, name = 'dotnet'
    frame #0: 0x00007ffff79c152c libpthread.so.0`__pthread_cond_wait + 508
    frame #1: 0x00007ffff62280cb libcoreclr.so`CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) + 379
    frame #2: 0x00007ffff6227c3f libcoreclr.so`CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) + 415
    frame #3: 0x00007ffff622ce41 libcoreclr.so`CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int) + 1969
    frame #4: 0x00007ffff5e4f1eb libcoreclr.so`Thread::DoAppropriateWaitWorker(int, void**, int, unsigned int, WaitMode) + 1355
    frame #5: 0x00007ffff5e49c10 libcoreclr.so`Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*) + 80
    frame #6: 0x00007ffff5f455af libcoreclr.so`CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) + 95
    frame #7: 0x00007ffff5e4fce0 libcoreclr.so`Thread::Block(int, PendingSync*) + 32
    frame #8: 0x00007ffff5e48337 libcoreclr.so`SyncBlock::Wait(int, int) + 743
    frame #9: 0x00007ffff61a707b libcoreclr.so`ObjectNative::WaitTimeout(bool, int, Object*) + 235
    frame #10: 0x00007fff7c7ed993
    frame #11: 0x00007fff7d8cde71
    frame #12: 0x00007ffff5f7d703 libcoreclr.so`CallDescrWorkerInternal + 124
    frame #13: 0x00007ffff5e86f82 libcoreclr.so`DispatchCallSimple(unsigned long*, unsigned int, unsigned long, unsigned int) + 242
    frame #14: 0x00007ffff5fd2fa5 libcoreclr.so`DistributeEvent(Object**, Object**) + 325
    frame #15: 0x00007ffff5f8e84a libcoreclr.so`AppDomain::RaiseOneExitProcessEvent() + 138
    frame #16: 0x00007ffff5f8ea17 libcoreclr.so`AppDomain::RaiseOneExitProcessEvent_Wrapper(AppDomainIterator*) + 327
    frame #17: 0x00007ffff5f8ed4c libcoreclr.so`AppDomain::RaiseExitProcessEvent() + 156
    frame #18: 0x00007ffff5ebd95a libcoreclr.so`FinalizerThread::FinalizerThreadStart(void*) + 282
    frame #19: 0x00007ffff6233894 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(void*) + 388
    frame #20: 0x00007ffff79bb594 libpthread.so.0`start_thread + 228
    frame #21: 0x00007ffff6db100f libc.so.6`__GI___clone + 63

@tmds
Copy link
Member Author

tmds commented Jun 26, 2018

debugging with ASP.NET Core 2.0 (for which I have libsosplugin matching lldb) shows ProcessExit is stuck at Microsoft.AspNetCore.Hosting.WebHostExtensions+<>c__DisplayClass6_0.<AttachCtrlcSigtermShutdown>g__Shutdown0():

00007FFFE2922A90 00007fff7c7ed993 (MethodDesc 00007fff7c35c778 + 0x2f3 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)), calling (MethodDesc 00007fff7c33ea28 + 0 System.Threading.Monitor.Wait(System.Object, Int32, Boolean))
00007FFFE2922B20 00007fff7d8cde71 (MethodDesc 00007fff7d7947c8 + 0x91 Microsoft.AspNetCore.Hosting.WebHostExtensions+<>c__DisplayClass6_0.<AttachCtrlcSigtermShutdown>g__Shutdown0()), calling (MethodDesc 00007fff7c35c778 + 0 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken))

@tmds
Copy link
Member Author

tmds commented Jun 26, 2018

The issue has already been fixed here: aspnet/Hosting#1432. It is not in 2.1.

@tmds
Copy link
Member Author

tmds commented Jun 28, 2018

@DamianEdwards I've provided the info you requested (https://github.com/aspnet/Hosting/issues/1416#issuecomment-397990693): on other stacks the process terminates normally with a non-zero return code.

@tmds
Copy link
Member Author

tmds commented Jul 9, 2018

@DamianEdwards @halter73 @muratg can you look at this and decide what to do about it?

@davidfowl
Copy link
Member

Yea this should be patched

@halter73
Copy link
Member

I agree we should do something @davidfowl, but what exactly do you suggest we patch? We don't change templates in patches, do we? I think we should not call abort for unhandled exception and just terminate with a non-zero exit code like ruby, node, python, java etc...

@davidfowl
Copy link
Member

@muratg
Copy link
Contributor

muratg commented Nov 16, 2018

@davidfowl The hang is fixed in 2.2 (and 3.0+). I don't think it would meet 2.1 bar at this point, but if you disagree, please file a specific bug for 2.1.

Re: core-dump vs displaying a stack-trace and exiting with 1. That sounds like a CoreFX issue, no?

@aspnet-hello aspnet-hello transferred this issue from aspnet/Hosting Dec 18, 2018
@aspnet-hello aspnet-hello added this to the Discussions milestone Dec 18, 2018
@muratg
Copy link
Contributor

muratg commented Jan 9, 2019

It doesn't look like there's any action on ASP.NET here. @davidfowl, if you disagree, please reopen.

@muratg muratg closed this as completed Jan 9, 2019
@tmds
Copy link
Member Author

tmds commented Jan 10, 2019

To address this, either a change is needed in the runtime or in ASP.NET Core.

Proposing a change to ASP.NET Core:

Change the generic host Run/RunAsync method so it returns an int which is meant to be used as the exit code from Main.

So:

public static void Main(string[] args)
    => CreateWebHostBuilder(args).Build().Run();

becomes:

public static int Main(string[] args)
    => CreateWebHostBuilder(args).Build().Run();

This Run method would be the idiomatic way to write an ASP.NET Core application and connect the return value to Main.

When using this Run method, the ASP.NET Core app should be bound to the ConsoleLifetime.

There should be a another method on the host (Start?), that is the idiomatic way to use ASP.NET Core within an another application (e.g. WPF) that doesn't bind to the ConsoleLifetime (cfr dotnet/extensions#574).

Using the Run method can also deal with the ExitCode as needed for changes in dotnet/coreclr#21300 (comment).

@Tratcher @davidfowl wdyt?

@Tratcher
Copy link
Member

You would have to plum that through a lot of layers to get down to the server that generated the original binding error. It would also leave us with overlapping error handling mechanics.

I still don't see a reason to special case this class of errors from a framework perspective. failing to bind is fatal and if you don't want a dump for some fatal errors then you can handle those in Main.

@tmds
Copy link
Member Author

tmds commented Jan 10, 2019

You would have to plum that through a lot of layers to get down to the server that generated the original binding error. It would also leave us with overlapping error handling mechanics.
I still don't see a reason to special case this class of errors from a framework perspective. failing to bind is fatal and if you don't want a dump for some fatal errors then you can handle those in Main.

I'm thinking of a try catch for any exception that prints the exception and turns it into a 1 exit code.

That gives you the same behavior like other stacks (ruby, node, python).

This pattern is also in the System.CommandLine package by default: https://github.com/dotnet/command-line-api/blob/750f2365800c8e894f4bc3e54bbc73dcc9623f1d/src/System.CommandLine/Builder/CommandLineBuilderExtensions.cs#L147.

@muratg muratg reopened this Jan 10, 2019
@muratg
Copy link
Contributor

muratg commented Jan 10, 2019

@davidfowl ping

@halter73
Copy link
Member

I like aligning our behavior with other stacks, but this proposal doesn't quite do that. dotnet/coreclr/issues/17929 more fully aligns behavior, right? I support doing that instead, since I don't think flowing error codes is idiomatic in C#.

@tmds
Copy link
Member Author

tmds commented Jan 11, 2019

I like aligning our behavior with other stacks, but this proposal doesn't quite do that. dotnet/coreclr/issues/17929 more fully aligns behavior, right? I support doing that instead, since I don't think flowing error codes is idiomatic in C#.

I agree with that.

With the change in dotnet/coreclr#21300, the Host will need to deal with setting the ExitCode (#6526). This means the C# becomes anyhow aware of exit codes. So we can't fully push it to the runtime.

This Run method would put everything in one place where we go from C# exceptions to process exit codes.

@aspnet-hello
Copy link

We periodically close 'discussion' issues that have not been updated in a long period of time.

We apologize if this causes any inconvenience. We ask that if you are still encountering an issue, please log a new issue with updated information and we will investigate.

@dotnet dotnet locked and limited conversation to collaborators Apr 19, 2019
@amcasey amcasey added the area-hosting Includes Hosting label Jun 1, 2023
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-hosting Includes Hosting area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests