Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.0.0-alpha.2 - Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer #1464

Closed
lukebakken opened this issue Jan 8, 2024 Discussed in #1463 · 23 comments
Assignees
Labels

Comments

@lukebakken
Copy link
Contributor

Discussed in #1463

Originally posted by plewam January 7, 2024
I am currently testing the 7.0.0-alpha.2 release. I have set the AutomaticRecoveryEnabled to true. During the consume of messages from my remote server my VM somehow had it's network connection interrupted for a short period of time. I use all the new async methods.

During this short outage I receive these events from the library. Please not that these are my own logging texts.

  1. (Channel) ChannelShutdown
  2. (Consumer) ConsumerCancelled
  3. (Consumer) Shutdown

RabbitMQ model was shutdown. ReplyText was <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">

RabbitMQ consumer was cancelled. Consumer tag <"71cd2fcf-0f86-477b-86d6-c9bf0518b51e">

RabbitMQ consumer was shut down. ReplyText <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">. Initiator <Library>

However I would expect the channel and consumer to automatically recover. Can somebody maybe give me some tips? Or is this issue already known?

@lukebakken lukebakken self-assigned this Jan 8, 2024
@lukebakken lukebakken added the bug label Jan 8, 2024
@lukebakken
Copy link
Contributor Author

lukebakken commented Jan 8, 2024

@plewam thank you for continuing to test the next release of this library. Some questions:

  • After the network interruption, your consumer did not resume consuming messages?
  • What was logged by RabbitMQ at the same time as the interruption? Log messages from about 5 minutes before and after the interruption would be great. This information will be very valuable, if you have it.

I will work on a test to reproduce this scenario.

lukebakken added a commit that referenced this issue Jan 9, 2024
@lukebakken
Copy link
Contributor Author

@plewam - if you have time, please check out my test application in this PR:

https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1466/files

It sets up a connection for the consumer, and a different one for the publisher.

My test environment is all Windows 11 pro. I start up RabbitMQ 3.12.11 (Erlang 26.2.1) in one terminal window, then I start up a Toxiproxy server in another, and set up a proxy like this:

Start server in one console:

.\toxiproxy-server.exe

Set up proxy in another:

 .\toxiproxy-cli.exe create --listen 55672 --upstream 5672 rmq-localhost

Then, start up the test console application:

.\projects\gh-1464\bin\Debug\net6.0\gh-1464.exe

Let it start publishing and consuming, then abruptly kill and restore the consumer's connection this way:

.\toxiproxy-cli.exe toggle rmq-localhost; Start-Sleep -Seconds 1; .\toxiproxy-cli.exe toggle rmq-localhost

You can see the consumer is eventually restored and consumes the ready messages.

There must be something different in your code, can you share a runnable example?

@plewam
Copy link

plewam commented Jan 9, 2024

@lukebakken In regards to your questions:

  1. The consumer did not continue consuming messages
  2. I collected the logs from the rabbitmq server, however its not that much, see below (The last two entries should be the ones you are looking for. Generally the connections is relatively unstable)

BTW: The rmq server I connected to has version 3.11.8 and erlang 25.2.2. I can upgrade it if you want.

Logs
2024-01-07 01:05:22.263000-04:00 [warning] <0.11783.31> closing AMQP connection <0.11783.31> (10.40.1.240:55748 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 01:05:22.263000-04:00 [warning] <0.11783.31> client unexpectedly closed TCP connection
2024-01-07 01:15:09.120000-04:00 [info] <0.12842.31> accepting AMQP connection <0.12842.31> (10.40.1.240:45558 -> 172.20.178.47:47490)
2024-01-07 01:15:11.915000-04:00 [info] <0.12842.31> connection <0.12842.31> (10.40.1.240:45558 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 01:46:12.019000-04:00 [error] <0.12842.31> closing AMQP connection <0.12842.31> (10.40.1.240:45558 -> 172.20.178.47:47490):
2024-01-07 01:46:12.019000-04:00 [error] <0.12842.31> missed heartbeats from client, timeout: 60s
2024-01-07 01:46:24.632000-04:00 [info] <0.12969.31> accepting AMQP connection <0.12969.31> (10.40.1.240:35704 -> 172.20.178.47:47490)
2024-01-07 01:46:26.487000-04:00 [info] <0.12969.31> connection <0.12969.31> (10.40.1.240:35704 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 02:33:14.885000-04:00 [warning] <0.12969.31> closing AMQP connection <0.12969.31> (10.40.1.240:35704 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 02:33:14.885000-04:00 [warning] <0.12969.31> client unexpectedly closed TCP connection
2024-01-07 02:33:23.542000-04:00 [info] <0.13804.31> accepting AMQP connection <0.13804.31> (10.40.1.240:50840 -> 172.20.178.47:47490)
2024-01-07 02:33:24.849000-04:00 [info] <0.13804.31> connection <0.13804.31> (10.40.1.240:50840 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 03:06:24.898000-04:00 [error] <0.13804.31> closing AMQP connection <0.13804.31> (10.40.1.240:50840 -> 172.20.178.47:47490):
2024-01-07 03:06:24.898000-04:00 [error] <0.13804.31> missed heartbeats from client, timeout: 60s
2024-01-07 03:06:30.406000-04:00 [info] <0.14615.31> accepting AMQP connection <0.14615.31> (10.40.1.240:55446 -> 172.20.178.47:47490)
2024-01-07 03:06:30.919000-04:00 [info] <0.14615.31> connection <0.14615.31> (10.40.1.240:55446 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 05:09:31.107000-04:00 [error] <0.14615.31> closing AMQP connection <0.14615.31> (10.40.1.240:55446 -> 172.20.178.47:47490):
2024-01-07 05:09:31.107000-04:00 [error] <0.14615.31> missed heartbeats from client, timeout: 60s
2024-01-07 05:09:38.677000-04:00 [info] <0.15609.31> accepting AMQP connection <0.15609.31> (10.40.1.240:34510 -> 172.20.178.47:47490)
2024-01-07 05:09:43.350000-04:00 [info] <0.15609.31> connection <0.15609.31> (10.40.1.240:34510 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 07:05:43.455000-04:00 [warning] <0.15609.31> closing AMQP connection <0.15609.31> (10.40.1.240:34510 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 07:05:43.455000-04:00 [warning] <0.15609.31> client unexpectedly closed TCP connection
2024-01-07 07:14:55.131000-04:00 [info] <0.16609.31> accepting AMQP connection <0.16609.31> (10.40.1.240:34012 -> 172.20.178.47:47490)
2024-01-07 07:14:55.637000-04:00 [info] <0.16609.31> connection <0.16609.31> (10.40.1.240:34012 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 08:02:53.237000-04:00 [warning] <0.16609.31> closing AMQP connection <0.16609.31> (10.40.1.240:34012 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 08:02:53.237000-04:00 [warning] <0.16609.31> client unexpectedly closed TCP connection
2024-01-07 08:12:01.223000-04:00 [info] <0.17464.31> accepting AMQP connection <0.17464.31> (10.40.1.240:53490 -> 172.20.178.47:47490)
2024-01-07 08:12:01.729000-04:00 [info] <0.17464.31> connection <0.17464.31> (10.40.1.240:53490 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 08:40:52.457000-04:00 [warning] <0.17464.31> closing AMQP connection <0.17464.31> (10.40.1.240:53490 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 08:40:52.457000-04:00 [warning] <0.17464.31> client unexpectedly closed TCP connection
2024-01-07 08:40:57.962000-04:00 [info] <0.18248.31> accepting AMQP connection <0.18248.31> (10.40.1.240:44906 -> 172.20.178.47:47490)
2024-01-07 08:41:07.971000-04:00 [error] <0.18248.31> closing AMQP connection <0.18248.31> (10.40.1.240:44906 -> 172.20.178.47:47490):
2024-01-07 08:41:07.971000-04:00 [error] <0.18248.31> {handshake_timeout,frame_header}
2024-01-07 08:41:25.489000-04:00 [info] <0.18253.31> accepting AMQP connection <0.18253.31> (10.40.1.240:38782 -> 172.20.178.47:47490)
2024-01-07 08:41:33.155000-04:00 [info] <0.18253.31> connection <0.18253.31> (10.40.1.240:38782 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-01-07 08:45:20.791000-04:00 [warning] <0.18253.31> closing AMQP connection <0.18253.31> (10.40.1.240:38782 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-01-07 08:45:20.791000-04:00 [warning] <0.18253.31> client unexpectedly closed TCP connection

@plewam
Copy link

plewam commented Jan 9, 2024

@lukebakken Is there a way to pass in a logger into the library, e.g. the https://github.com/rabbitmq/rabbitmq-stream-dotnet-client is doing that quite good.

@lukebakken
Copy link
Contributor Author

Unfortunately, there isn't a way to pass in a logger at this point in time.

Thank you for the logs. Some more questions:

  • Do you know at what time your consuming application experienced the issue, so I can correlate it to the logs you provided?
  • Do you know what kind of network interruption happened? For instance, was it like pulling the plug on an Ethernet cord?

lukebakken added a commit that referenced this issue Feb 20, 2024
lukebakken added a commit that referenced this issue Feb 20, 2024
@lukebakken
Copy link
Contributor Author

@plewam this PR adds a test case that tests what happens when a TCP RESET is encountered - #1466

I have yet to reproduce what you report. Once the tests pass for that PR, I will merge it, close this issue, and publish a new alpha release for you to continue testing.

Thank you!

lukebakken added a commit that referenced this issue Feb 20, 2024
@lukebakken
Copy link
Contributor Author

lukebakken commented Feb 20, 2024

@plewam
Copy link

plewam commented Feb 21, 2024

@lukebakken Thank you very much. As of now I have not encountered this issue again. I will deploy alpha 3 soon onto my testing environment

@lukebakken
Copy link
Contributor Author

@plewam thank you very much for continuing to test these alpha releases 🏅 🌟

@plewam
Copy link

plewam commented Feb 22, 2024

@lukebakken I have to correct myself. I just today faced the issue again now (Still on alpha2). I have now upgraded to alpha3 and realised that Channel.Ack, Channel.Nack and Channel.BasicQos were still used mistakenly... So I was mixing sync and async calls. I have now switched these three to async as well. I continue to monitor.

@lukebakken
Copy link
Contributor Author

Thank you!

@plewam
Copy link

plewam commented Feb 26, 2024

Hi @lukebakken,

It continues to happen, however I only have it with one specific broker and the connection to it.
Broker version: 3.11.8
Erlang version: 25.2.2

On this broker I have set the consumer timeout globally to 3 days.

2024-02-24 22:25:52.102980 [8180002] [INF] [043_010] RabbitMQ model was shutdown. ReplyText was <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">  
2024-02-24 22:25:52.114682 [8180003] [WRN] [043_010] RabbitMQ consumer was cancelled. Consumer tag <"dd5e53f2-44ae-4ff3-b861-d12dfa31e3dd">  
2024-02-24 22:25:52.115081 [8180004] [WRN] [043_010] RabbitMQ consumer was shut down. ReplyText <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">. Initiator <Library> 

On the broker itself I see log entries like these. The last two log entries match the above error (The broker is located in Chile with a 5 hour time shift and slightly out of sync.... Thats the environment I have to work with):

2024-02-24 16:31:24.429000-04:00 [info] <0.23472.4> accepting AMQP connection <0.23472.4> (10.40.1.240:54966 -> 172.20.178.47:47490)
2024-02-24 16:31:34.432000-04:00 [error] <0.23472.4> closing AMQP connection <0.23472.4> (10.40.1.240:54966 -> 172.20.178.47:47490):
2024-02-24 16:31:34.432000-04:00 [error] <0.23472.4> {handshake_timeout,frame_header}
2024-02-24 16:32:19.022000-04:00 [info] <0.23479.4> accepting AMQP connection <0.23479.4> (10.40.1.240:32974 -> 172.20.178.47:47490)
2024-02-24 16:32:29.022000-04:00 [error] <0.23479.4> closing AMQP connection <0.23479.4> (10.40.1.240:32974 -> 172.20.178.47:47490):
2024-02-24 16:32:29.022000-04:00 [error] <0.23479.4> {handshake_timeout,frame_header}
2024-02-24 16:32:45.804000-04:00 [info] <0.23485.4> accepting AMQP connection <0.23485.4> (10.40.1.240:34382 -> 172.20.178.47:47490)
2024-02-24 16:32:55.805000-04:00 [error] <0.23485.4> closing AMQP connection <0.23485.4> (10.40.1.240:34382 -> 172.20.178.47:47490):
2024-02-24 16:32:55.805000-04:00 [error] <0.23485.4> {handshake_timeout,frame_header}
2024-02-24 16:33:03.371000-04:00 [info] <0.23491.4> accepting AMQP connection <0.23491.4> (10.40.1.240:59300 -> 172.20.178.47:47490)
2024-02-24 16:33:03.890000-04:00 [info] <0.23491.4> connection <0.23491.4> (10.40.1.240:59300 -> 172.20.178.47:47490): user 'logreader' authenticated and granted access to vhost '/'
2024-02-24 17:20:23.341000-04:00 [warning] <0.23491.4> closing AMQP connection <0.23491.4> (10.40.1.240:59300 -> 172.20.178.47:47490, vhost: '/', user: 'logreader'):
2024-02-24 17:20:23.341000-04:00 [warning] <0.23491.4> client unexpectedly closed TCP connection

Please let me know if there is any further logging I can do. If you want I can also upgrade the broker if you like to see if it goes away.

@lukebakken
Copy link
Contributor Author

Thanks for the information. Just to confirm, in this specific case, the connection is NOT recovered, correct?

For what it's worth, the error is due to a network issue between RabbitMQ and your application. There may be a firewall that is closing the connection abruptly.

@lukebakken lukebakken reopened this Feb 26, 2024
@plewam
Copy link

plewam commented Feb 26, 2024

@lukebakken Yes, the connection is NOT recovered. I have to restart my application in order to consume from the queues again. I lose no data as I have a message TTL of 2 days, but anyways I want to solve this.

Sadly I have no insight into the firewall architecture to see whether this is happening or not. As it is only happening with this broker (I connect to around 20 other brokers with the same application) I suspect a specific issue and not a general one.

I could do the recovering myself if you would expose the consumer tag during consumer shutdown. If I remember correctly I have requested this at some point in another issue.

@lukebakken
Copy link
Contributor Author

I could do the recovering myself if you would expose the consumer tag during consumer shutdown. If I remember correctly I have requested this at some point in another issue.

Ah yes if you can find that, it would be great. I can do some searching later on, too.

@plewam
Copy link

plewam commented Feb 26, 2024

I could do the recovering myself if you would expose the consumer tag during consumer shutdown. If I remember correctly I have requested this at some point in another issue.

Ah yes if you can find that, it would be great. I can do some searching later on, too.

There it is => #1433

@lukebakken
Copy link
Contributor Author

Thanks!

@lukebakken
Copy link
Contributor Author

@plewam is there any way you can tell me what the exact Exception type is when you get Unable to read data from the transport connection: Connection reset by peer? Is it System.IO.IOException?

@plewam
Copy link

plewam commented Feb 27, 2024

@lukebakken My bad, I have now added all available properties from the ShutdownEventArgs to my logging. The next time I encounter the issue we will see the type of exception.

@lukebakken
Copy link
Contributor Author

No worries! I appreciate the testing you are doing! I will have another alpha release soon.

@plewam
Copy link

plewam commented Feb 27, 2024

@lukebakken, the exception occured again, see log below. It indeed is an IOException.

Line  65597: 2024-02-27 19:53:54.015198 [1682197] [INF] [043_010] RabbitMQ model was shutdown. ReplyText was <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">. Cause was <null>. Initiator was <Library>. MethodId was <0>. ClassId was <0>. ReplyCode was <541> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.
Line  65608: 2024-02-27 19:53:54.017580 [1682198] [WRN] [043_010] RabbitMQ consumer was cancelled. Consumer tag <"d3977b60-d77c-4dfb-810b-43411505a351">  
Line  65609: 2024-02-27 19:53:54.018145 [1682199] [WRN] [043_010] RabbitMQ consumer was shut down. ReplyText <"Unexpected Exception: Unable to read data from the transport connection: Connection reset by peer.">. Initiator <Library>. Cause was <null> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.

@lukebakken
Copy link
Contributor Author

Thanks for the follow-up!

@lukebakken
Copy link
Contributor Author

Please see this comment where I explain how to retrieve the consumer tags during the Shutdown event.

If this doesn't meet your requirements, @plewam, please let me know.

@lukebakken lukebakken closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants