Skip to content

Conversation

@mot256
Copy link

@mot256 mot256 commented Feb 18, 2022

Proposed Changes

While trying to reproduce RabbitMQ (re)connection issues we found in production, memory profiling showed us that instances of RabbitMQ.Client.Framing.Impl.Connection (and other related objects) are never released and are thus pinned for the life time of the application domain (even if RabbitMQ.Client.Framing.Impl.AutorecoveringConnection is disposed and released from memory)

We have found that this happened in 2 main scenarios:

  1. A new instance of RabbitMQ.Client.Framing.Impl.Connection is created and connection to the server failed
  2. A connection to the server was successful, but topology (or any other failure in the connection recovery process) failed

The second scenario above is of particular concern, as this not only causes memory leaks, it also keeps connections alive to the server (at least until there is a socket disconnect detected by the OS). Which in turn causes "dead lock" scenarios when exclusive consumers are being recovered, but some other error occurred there after while recovering the rest of the topology.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

N/A

@pivotal-cla
Copy link

@mot256 Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@pivotal-cla
Copy link

@mot256 Thank you for signing the Contributor License Agreement!

@lukebakken
Copy link
Collaborator

Hello @mot256 thank you for your contribution.

Could you please edit the description of this pull request and fill out the requested information? "Proposed Changes", "Types of Changes", "Checklist" and "Further Comments". We can't accept a pull request without any context.

We also need to know how to reproduce the issue that this pull request solves. How did you discover the issue?

@mot256
Copy link
Author

mot256 commented Feb 21, 2022

@lukebakken Thanks for the quick reply. I have updated the PR description as best I could without trying to clutter it with too much detail.
I'm still following your recommendations on running tests locally.
With regards to writing my own test, I'm a bit stuck. The tests involve causing network disconnects in a way that would not immediately drop the socket/connection on the RabbitMQ server (e.g. disconnecting physical network cable on the client machine). Could you maybe guide me to other similar tests related to network failures?

@michaelklishin
Copy link
Contributor

There are no tests that affect network sockets. @mot256 can you please rebase your PR and explain how we can reasonably reliably reproduce the issue?

@lukebakken lukebakken marked this pull request as ready for review February 23, 2022 20:09
@lukebakken lukebakken added this to the 6.2.4 milestone Feb 23, 2022
@lukebakken lukebakken self-assigned this Feb 23, 2022
@lukebakken
Copy link
Collaborator

lukebakken commented Feb 23, 2022

The tests involve causing network disconnects in a way that would not immediately drop the socket/connection on the RabbitMQ server (e.g. disconnecting physical network cable on the client machine)

So for instance, you have a publisher using an AutorecoveringConnection that is publishing messages and you pull the network cable? I can give that a try locally.

It would be helpful if you shared the program you're using to test just so I am more certain of how you're testing. Thanks.

@lukebakken lukebakken merged commit 1fac6fc into rabbitmq:6.x Feb 23, 2022
lukebakken added a commit that referenced this pull request Feb 23, 2022
lukebakken added a commit that referenced this pull request Feb 23, 2022
lukebakken added a commit that referenced this pull request Feb 23, 2022
#1145

1fac6fc

Finish porting #1145 to main
@mot256 mot256 deleted the fnel/6.x/fix-auto-recovery-connection-leaks branch February 24, 2022 06:26
@mot256
Copy link
Author

mot256 commented Feb 24, 2022

@michaelklishin @lukebakken apologies for the late reply. I've been hunting another issue. Thankyou for merging the PR.
The test app I wrote is very much interweaved with our own code. I will simplify it and share it for testing.

@mot256
Copy link
Author

mot256 commented Feb 24, 2022

@lukebakken just a question (from a first timer in this repo) what is the process to merging this to 7.x and/or main branch?

@michaelklishin
Copy link
Contributor

@mot256 we backport or forward port, and ask for your help with resolving conflicts if anything major pops up.

michaelklishin added a commit that referenced this pull request Feb 24, 2022
@mantasaudickas
Copy link

Hello,
is this also fixing this issue: #1061 ?

@mot256
Copy link
Author

mot256 commented Feb 25, 2022

@mantasaudickas scanning through both #1061 and #1067 and it does feel like the same issues I experienced, hence this PR. It would be interesting to know if it did. Could you maybe retest with 6.2.4 and see?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants