Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BasicCancel hangs when using new async consumer #341

Closed
karoberts opened this issue Aug 3, 2017 · 19 comments · Fixed by #760
Closed

BasicCancel hangs when using new async consumer #341

karoberts opened this issue Aug 3, 2017 · 19 comments · Fixed by #760
Assignees
Labels
bug help wanted next-gen-todo If a rewrite happens, address this issue.
Milestone

Comments

@karoberts
Copy link

I have consumers which check if a certain condition has occurred and cancel. They was I do that is using IModel.BasicCancel(consumerTag). I've been using this successfully for quite some time.

Now using the new async consumer, the call to BasicCancel hangs.

It gets stuck here
2017-08-02_16h31_33

I know for certain this only affects the async consumer because the exact same code with a sync consumer takes about 3ms inside BasicCancel.

@michaelklishin
Copy link
Member

@karoberts how can we reproduce?

@michaelklishin
Copy link
Member

michaelklishin commented Aug 3, 2017

There is nothing that refers to consumer dispatch machinery in this call stack. IModel#BasicCancel shouldn't have changed at all with consumer dispatch service changes since it is a client-sent method, not a server-sent one.

Please take this to the mailing list and provide server logs and a way to reproduce.

@michaelklishin
Copy link
Member

Actually I can think of a way of BasicCancel to be affected, we've been there with a couple of clients before. We still need server logs and a code example to reproduce.

@karoberts
Copy link
Author

I'm working on reducing my code into a reproducible sample. Will report back soon.

@michaelklishin
Copy link
Member

Thank you!

@karoberts
Copy link
Author

karoberts commented Aug 3, 2017

ok, I think I've found it. I built up a sample using just rabbitmq client code, and of course it worked.

But then after much experimentation, I discovered the key. The issue occurs when the BasicCancel is issued before the async callback has entered a continuation.

For example, this works

AsyncEventHandler<BasicDeliverEventArgs> callbackAsync = async (o, ea) =>
{
    await Something();
    BasicCancel();
}

but this will cause BasicCancel() to hang and throw TimeoutException after 20 seconds

AsyncEventHandler<BasicDeliverEventArgs> callbackAsync = async (o, ea) =>
{
    BasicCancel();
    await Something();
}

If you still need the code, I can provide, but this seems pretty clear that BasicCancel() must be waiting for something that isn't unlocked until the async callback gives back the thread.

A workaround is an await Task.Yield() at the top of my async callback

@michaelklishin
Copy link
Member

Thanks, that saves us maintainers a lot of time!

BasicCancel waits for a continuation in the channel to complete (which happens when basic.cancel-ok is received from the server and dispatched). Looks like that part isn't handled in the async dispatcher. Perhaps @danielmarbach, @bording or @kjnilsson can spot it.

As a workaround you probably can use IModel#BasicCancel with nowait set to true: there will be no server response expected in that case.

@michaelklishin michaelklishin added this to the 5.0.2 milestone Aug 3, 2017
@danielmarbach
Copy link
Collaborator

@karoberts can you give a few indicators where the code is hosted? Do you have an environment which is context aware like WPF, WinForms or anything the like?

@danielmarbach
Copy link
Collaborator

@michaelklishin What we are seeing here is a consequence of the synchronous blocking nature of the model in combination with async. I think I highlighted this previously in conversations. Until the model is significantly refactored to be truly asynchronous clients using this library might always be subjected to deadlocks. By default the thread that enters the async code path executes all the synchronous parts of the callstack until the first await statement is reached. In our case this is the worker dispatcher thread. Since BasicCancel calls into Monitor.Wait this captures the dispatcher thread and we are locked for the wait time of the timeout of the blocking cell implementation since the value cannot be set (holding the lock by the same thread).

Since no context capturing is enabled here many async implementations should be fine because they will probably have awaited something first before they call methods on the model. All consumer that do not access the model are also fine. So the solution here is to do a Yield at the beginning of the event handler or for safety precaution reasons it would be possible to modify to Work class to do it uniformly

        public async Task Execute(ModelBase model)
        {
            try
            {
                await Task.Yield();
                await Execute(model, asyncConsumer).ConfigureAwait(false);
            }
            catch (Exception)
            {
               // intentionally caught
            }
        }

I guess any other work like moving the model to a true asynchronous one is probably out of question at this stage since the work on the new client should start soonish

@kjnilsson
Copy link
Contributor

Thanks for the analysis @danielmarbach. So the question here is whether we should always Yield in Execute or rely on users to handle it. If we Yield in Execute the cost of this is a potential additional context switch?

@karoberts
Copy link
Author

@danielmarbach
Here is the code
https://gist.github.com/karoberts/bc10a6aa57d9bcfeda10fca86b68ffca

For my particular use case, there won't be any context.

In my code, I'm doing this just in case to avoid any context issues, which I got from here
https://stackoverflow.com/questions/28309185/task-yield-in-library-needs-configurewaitfalse

var scheduler = TaskScheduler.Current;
return Task.Factory.StartNew(() => receiveAsyncImpl(sender, ea), CancellationToken.None, TaskCreationOptions.PreferFairness, scheduler);

@seabrookmx
Copy link

Any movement on this one?

I am running into this, as well as some other behavior that I think is caused by the same underlying problem.
If I register an AsyncEventingBasicConsumer like so:

var consumer = new AsyncEventingBasicConsumer(channel);
consumer.Received += async (o, ea) =>
{
    await doStuffAsync(ea);
};
channel.BasicConsume(consumer, queue, false);

calling basic cancel on the consumer hangs for almost exactly 10seconds before returning.
Not only that, but while the AsyncEventingBasicConsumer is registered, running a basic publish as follows:

using (var channel = connection.CreateModel())
{
    channel.BasicPublish(
        exchange,
        topic,
        null,
        data
    );
}

hangs for 10s before exiting the using block. This same behavior happens if I call channel.Close() and channel.Dispose() instead of the using block. If no AsyncEventingBasicConsumer has been registered, or I swap to a regular EventingBasicConsumer, the issue disappears.

So the question here is whether we should always Yield in Execute or rely on users to handle it.

I don't fully grasp this comment.. but if there's a workaround I can use on my side please let me know!

@michaelklishin michaelklishin removed this from the 5.0.2 milestone Apr 11, 2018
@seabrookmx
Copy link

I'm still curious about the status of this issue 😃

@lukebakken
Copy link
Contributor

@seabrookmx I am scheduling this bug to be investigated and hopefully fixed for 6.0. Could you please provide a full code sample I can run to reproduce?

@lukebakken lukebakken self-assigned this Jan 30, 2020
@lukebakken lukebakken added this to the 6.0.0 milestone Jan 30, 2020
@NickLydon
Copy link

@lukebakken maybe this helps you? https://github.com/NickLydon/RabbitMQAsyncDeadlock

I think this is the same issue...

@danielmarbach
Copy link
Collaborator

Another reason why the client should be async all the way and not a weird mixture of both.

Unfortunately the async consumer can have this behavior where yielding is needed depending on what API is called. Especially the one's that use the blocking cell are subjected to deadlocks.

@andrey-dengin
Copy link

As a workaround you probably can use IModel#BasicCancel with nowait set to true: there will be no server response expected in that case.

@michaelklishin can you give a simple example for suggested workaround, please.

@michaelklishin
Copy link
Member

@a-dengin set the nowait to true when calling IModel#BasicCancel (a few other methods support it as well). This sounds like a specific enough suggestion, please take this from here.

@andrey-dengin
Copy link

@michaelklishin thank you for reply. But i am i little bit confused. In fact I can not find any overloaded method IModel#BasicCancel that would accept the flag nowait as one of the parameters. Only string consumerTag can be passed as a parameter of BasicCancel method.

lukebakken added a commit that referenced this issue Mar 16, 2020
lukebakken added a commit that referenced this issue Mar 16, 2020
lukebakken added a commit that referenced this issue Mar 17, 2020
@lukebakken lukebakken added the next-gen-todo If a rewrite happens, address this issue. label Mar 17, 2020
@lukebakken lukebakken mentioned this issue May 31, 2020
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted next-gen-todo If a rewrite happens, address this issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants