-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proxy] Fix port exhaustion and connection issues in Pulsar Proxy #14078
[Proxy] Fix port exhaustion and connection issues in Pulsar Proxy #14078
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I left some minor suggestions
we should cherry pick this patch to 2.9 and 2.8 branches
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/DirectProxyHandler.java
Show resolved
Hide resolved
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConnection.java
Show resolved
Hide resolved
c51856a
to
e4b4081
Compare
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConnection.java
Show resolved
Hide resolved
e4b4081
to
d8c21cf
Compare
- enables service discovery in the proxy - required by apache/pulsar#14078 changes
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConnection.java
Show resolved
Hide resolved
d8c21cf
to
01f4d47
Compare
@merlimat There are some test failures, which I'll address next week. Please review the high level approach to see if that's fine. |
Fixes apache#14075 Fixes apache#13923 - Optimize the proxy connection to fail-fast if the target broker isn't active - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached. - Pulsar client will retry connecting after a back off timeout - Fixes the race condition in the Pulsar Proxy when opening a connection since that could lead to invalid states and hanging connections - Add connect timeout handling to proxy connection - default to 10000 ms which is also the default of client's connect timeout - Add read timeout handling to incoming connection and proxied connection - the ping/pong keepalive messages should prevent the timeout happening, however it's possible that the connection is in a state where keepalives aren't happening. - therefore it's better to have a connection level read timeout prevent broken connections left hanging in the proxy
- the test wasn't testing the proxy at all
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConfiguration.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left one comment about a test for ports.
import org.apache.curator.shaded.com.google.common.net.InetAddresses; | ||
import org.testng.annotations.Test; | ||
|
||
public class BrokerProxyValidatorTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be appropriate to add a test named shouldPreventInvalidPort
.
pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConnection.java
Show resolved
Hide resolved
…ache#14078) (cherry picked from commit 640b4e6)
…ache#14078) (cherry picked from commit 640b4e6) (cherry picked from commit 3d2e6ce) (cherry picked from commit b3bac91)
…ache#14078) (cherry picked from commit 640b4e6) (cherry picked from commit 3d2e6ce)
Fixes #14075
Fixes #13923
Motivation
Pulsar Proxy can get into a state where it stops proxying Broker connections while Admin API proxying keeps working.
The proxy logs are filled with this type of warnings:
The "Cannot assign requested address" error message is a sign of a port exhaustion issue where there are many connections open, possibly hanging.
Additional context
One possible reason for the broken hanging connections could be a race condition that shows up in logs this way:
This is reported as #13923 and it is fixed as part of the same PR.
Modifications
Optimize the proxy connection to fail-fast if the target broker isn't active
Fixes the race condition in the Pulsar Proxy when opening a connection since that
could lead to invalid states and hanging connections
Add connect timeout handling to proxy connection
Add read timeout handling to incoming connection and proxied connection
however the connection might be in a state where keepalives aren't happening.
hanging in the proxy