Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutorecoveringConnection prevents AppDomain.Unload() of enclosing AppDomain #826

Closed
ghost opened this issue May 8, 2020 · 10 comments · Fixed by #1583
Closed

AutorecoveringConnection prevents AppDomain.Unload() of enclosing AppDomain #826

ghost opened this issue May 8, 2020 · 10 comments · Fixed by #1583
Assignees
Milestone

Comments

@ghost
Copy link

ghost commented May 8, 2020

Abstract

An active RabbitMQ.Client.Framing.Impl.AutorecoveringConnection instance violently opposes any attempt by its enclosing System.AppDomain to unload itself, resulting an a thrown System.CannotUnloadDomainException and a subsequent crash of the containing OS process.

Details

Closer analysis shows that the RabbitMQ.Client.Framing.Impl.Connection instance embedded in an AutorecoveringConnection will experience the call to AppDomain.Unload(...) as a System.Threading.ThreadAbortException inside its MainLoop method, specifically in the call to InboundFrame inboundFrame = _frameHandler.ReadFrame().
Because AutorecoveringConnection generally tries to recover from Thread aborts, it will proceed to enter its automatic recovery procedure.
Thus, an active Connection will be restored, and in all likelyhood the persisting reference to an unmanaged resource - namely a network Socket - is what prevents the AppDomain from unloading in an orderly manner.

All of the above was observed while running RabbitMQ.Client 6.0.0 in the .NET 4.6.1 variant inside a WebService in Microsoft Internet Information Server on Windows Server 2016.

Apparently, previous versions of RabbitMQ.Client exposed a means of subscribing Connection instances to the AppDomain.DomainUnload Event via the Connection.HandleDomainUnload(...) method, but this method is no longer publicly accessible.

Relevance

The observed behavior is a relevant problem because IIS is in the habit of maintaining multiple AppDomain instances inside an Application Pool and recycling these AppDomains on various occasions, e.g. changes to the web.config, which will currently result in a crash of the associated Application Pool. This hurts scenarios where IIS is used as a bridging technology from WebServices to RabbitMQ.

Please consider modifying the behavior of the AutorecoveringConnection to explicitly handle AppDomain unload scenarios.

Workaround

In the present state of affairs, the System.CannotUnloadDomainException can be avoided by disposing all active AutorecoveringConnection instances from within an explicit AppDomain.DomainUnload Event handler.

Repro

The code snippet attached below will raise the CannotUnloadDomainException on AppDomain.DomainUnload(...) unless called with the useWorkaround parameter.

using System;
using System.Net;
using System.Reflection;
using System.Threading;
using RabbitMQ.Client;

namespace RabbitMQClientCrashOnAppDomainUnload
{
    internal class Program
    {
        // May have to change this to relevant local port.
        private const int MyAmqpsPort = 5674;

        public class Worker : MarshalByRefObject
        {
            public void Execute(string virtualHost, string userName, string password, bool useWorkaround)
            {
                var connectionFactory = new ConnectionFactory() { UserName = userName, Password = password, VirtualHost = virtualHost };

                var hostName = Dns.GetHostEntry(string.Empty).HostName;

                Console.WriteLine($"Connecting to VirtualHost '{virtualHost}' on host '{hostName}' as user '{userName}'");
                var connection = connectionFactory.CreateConnection(new[] { new AmqpTcpEndpoint(hostName, MyAmqpsPort) });

                // BEGIN Workaround
                if (useWorkaround)
                    AppDomain.CurrentDomain.DomainUnload += (_, __) => connection.Dispose();
                // END Workaround

                while (true)
                {
                    Console.Write(".");
                    Thread.Sleep(TimeSpan.FromMilliseconds(500));
                }
            }
        }

        private static void Main(string[] args)
        {
            if (args.Length < 3 || args.Length > 4)
            {
                Console.Error.WriteLine($"Usage: {Assembly.GetExecutingAssembly().GetName().Name} <virtualHost> <userName> <password> [\"true\"]");
                return;
            }

            var (virtualHost, userName, password) = (args[0], args[1], args[2]);
            var useWorkaround = args.Length == 4 && bool.Parse(args[3]);

            var childDomain = AppDomain.CreateDomain("Child");

            var workerProxy = (Worker)childDomain.CreateInstanceAndUnwrap(Assembly.GetExecutingAssembly().FullName, $"{typeof(Worker).FullName}");

            var workerThread = new Thread(() => workerProxy.Execute(virtualHost, userName, password, useWorkaround));
            workerThread.Start();

            Thread.Sleep(TimeSpan.FromSeconds(2));

            try
            {
                Console.Out.WriteLine("\n\nUnloading child AppDomain - this may take a couple of seconds");

                AppDomain.Unload(childDomain);

                Console.Out.WriteLine("\n\nUnloaded child AppDomain without error\n");
            }
            catch (CannotUnloadAppDomainException ex)
            {
                Console.Error.WriteLine($"\n\nUnexpectedly received {nameof(CannotUnloadAppDomainException)}: {ex.Message}\n{ex.StackTrace}\n");
            }

            Console.Out.WriteLine("\nPress any key to finish.");
            Console.ReadLine();
        }
    }
}
@michaelklishin
Copy link
Member

What would you recommend as the desired behavior? How can autorecovering connections detect a condition where they should stop performing recoveries?

@ghost
Copy link
Author

ghost commented May 8, 2020

Well, in my opinion an object should not resist its parent AppDomain being unloaded - except for very specific reasons - which I don't see present here.

So I would recommend that AutorecoveringConnection instances should subscribe to the AppDomain.DomainUnload Event of their containing AppDomain and preferrably close and dispose their delegate Connection instance at the time it gets triggered.
At the very least they should cease their recovering effort and leave the rest to the Thread cleanup mechanics executed by the AppDomain.

Does that make sense to you?

@michaelklishin
Copy link
Member

Yes, it does. I just wanted this conversation to move to a specific suggestion. Other clients sometimes limit connection recovery attempts e.g. when they haven't succeeded for a certain number of times. Reacting to domain shutdown, if we can do it easily, makes perfect sense.

@lukebakken
Copy link
Contributor

As a note, AppDomain is not fully implemented on .NET core -

https://docs.microsoft.com/en-us/dotnet/core/porting/net-framework-tech-unavailable

@ghost
Copy link
Author

ghost commented May 8, 2020

Nothing fancy needed at our end. We have not tried to be clever with configuring AutorecoveringConnection behavior so far.
If only we could host RabbitMQ.Client in ASP.NET WebServices and be spared those annoying AppPool crashes we'd be perfectly happy.

Ah yes, and things could be entirely different on ASP.NET Core, so it would be very helpful if you could look into the corresponding situation on that platform, too.

@lukebakken lukebakken self-assigned this May 8, 2020
@lukebakken lukebakken modified the milestones: 6.0.1, 6.1.0 May 8, 2020
@lukebakken
Copy link
Contributor

@bording if you have a second, I would be very interested in your opinion here. It seems odd for this library to interact with an AppDomain event directly (DomainUnload), but maybe that's an acceptable practice?

@stebet
Copy link
Contributor

stebet commented May 14, 2020

Wouldn't it make more sense for the user to hook up closing the AutorecoveringConnection on AppDomain.Unload, since the event is never raised in the default AppDomain?

https://docs.microsoft.com/en-us/dotnet/api/system.appdomain.domainunload?view=netframework-4.8

@bording
Copy link
Collaborator

bording commented May 14, 2020

Doing some quick research, I do see some historical discussion on this topic, and I see the now-vestigial Connection.HandleDomainUnload method.

I agree that it's odd for this library to directly be interacting with AppDomain APIs, If someone was creating custom AppDomains themselves as the repro in the initial post demonstrates, then it would be reasonable to say that they need to also wire up the DomainUnload event.

However, you can often be inside a non-default AppDomain without even realizing it, so I'm not sure I'd want the answer here to be "it's up to the user to wire it up themselves."

If we take a step back, why do we need to worry about AppDomains unloading at all? If the unload shows up to the Connection class as a ThreadAbortException, why it is just swallowing it up in a way that makes AutoRecoveringConnection try and reconnect?

It seems like making a ThreadAbortException a case where the connection is shut down and no recovery is attempted would be enough to prevent the domain from unloading. it's hard to imagine that would have negative impact in any other scenario.

@ghost
Copy link
Author

ghost commented May 15, 2020

Good point.
I did not, by any means, intend to suggest that a solution should directly follow the approach of my repro snippet.
What I would like to emphasize is that being

inside a non-default AppDomain without even realizing it

is the typical situation any IIS-hosted ASP.NET WebService will find itself in, so it could be quite common in the Windows ecosystem.

@lukebakken lukebakken removed this from the 6.1.0 milestone May 19, 2020
@lukebakken
Copy link
Contributor

This does not feel like a "must have" for 6.1.0. Punting for now.

@lukebakken lukebakken added this to the 7.0.0 milestone Dec 29, 2023
lukebakken added a commit that referenced this issue May 30, 2024
lukebakken added a commit that referenced this issue May 30, 2024
lukebakken added a commit that referenced this issue May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants