Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][proxy] Fix refresh client auth #17831

Merged
merged 2 commits into from
Sep 29, 2022

Conversation

nodece
Copy link
Member

@nodece nodece commented Sep 23, 2022

Fixes #10816

Motivation

When using the proxy, the Pulsar cannot handle the authentication data on the proxy. The proxy has a client used for lookup operations, which passes the user's and proxy's authentication data to the broker. When the broker found the user's authentication data had expired, the broker sent the refresh request to the proxy, but the proxy could not handle refreshing this authentication data incorrectly.

image

When an error occurs, Pulsar disconnects all components, which will affect the Pulsar performance.

Fixed version:

2022-09-23T11:18:36,607 - INFO  - [pulsar-proxy-io-38-2:ProxyConnection@341] - [/127.0.0.1:51556] complete connection, init proxy handler. authenticated with token role client, hasProxyToBrokerUrl: false
2022-09-23T11:18:36,615 - INFO  - [pulsar-proxy-io-38-5:ConnectionPool@245] - [[id: 0x3e7cc63b, L:/127.0.0.1:51557 - R:localhost/127.0.0.1:51549]] Connected to server
2022-09-23T11:18:36,635 - INFO  - [pulsar-io-6-1:ServerCnx@299] - New connection from /127.0.0.1:51557
2022-09-23T11:18:36,691 - INFO  - [pulsar-client-io-40-1:ProducerStatsRecorderImpl@106] - Starting Pulsar producer perf with config: {"topicName":"persistent://my-tenant/my-ns/my-topic1","producerName":null,"sendTimeoutMs":30000,"blockIfQueueFull":false,"maxPendingMessages":0,"maxPendingMessagesAcrossPartitions":0,"messageRoutingMode":"RoundRobinPartition","hashingScheme":"JavaStringHash","cryptoFailureAction":"FAIL","batchingMaxPublishDelayMicros":1000,"batchingPartitionSwitchFrequencyByPublishDelay":10,"batchingMaxMessages":1000,"batchingMaxBytes":131072,"batchingEnabled":true,"chunkingEnabled":false,"chunkMaxMessageSize":-1,"compressionType":"NONE","initialSequenceId":null,"autoUpdatePartitions":true,"autoUpdatePartitionsIntervalSeconds":60,"multiSchema":true,"accessMode":"Shared","lazyStartPartitionedProducers":false,"properties":{},"initialSubscriptionName":null}
2022-09-23T11:18:36,693 - INFO  - [pulsar-client-io-40-1:ProducerStatsRecorderImpl@107] - Pulsar client config: {"serviceUrl":"pulsar://localhost:51554/","authPluginClassName":null,"authParams":null,"authParamMap":null,"operationTimeoutMs":3000,"lookupTimeoutMs":3000,"statsIntervalSeconds":60,"numIoThreads":1,"numListenerThreads":1,"connectionsPerBroker":1,"connectionMaxIdleSeconds":180,"useTcpNoDelay":true,"useTls":false,"tlsKeyFilePath":null,"tlsCertificateFilePath":null,"tlsTrustCertsFilePath":null,"tlsAllowInsecureConnection":false,"tlsHostnameVerificationEnable":false,"concurrentLookupRequest":5000,"maxLookupRequest":50000,"maxLookupRedirects":20,"maxNumberOfRejectedRequestPerConnection":50,"keepAliveIntervalSeconds":30,"connectionTimeoutMs":10000,"requestTimeoutMs":60000,"initialBackoffIntervalNanos":100000000,"maxBackoffIntervalNanos":60000000000,"enableBusyWait":false,"listenerName":null,"useKeyStoreTls":false,"sslProvider":null,"tlsKeyStoreType":"JKS","tlsKeyStorePath":null,"tlsKeyStorePassword":null,"tlsTrustStoreType":"JKS","tlsTrustStorePath":null,"tlsTrustStorePassword":null,"tlsCiphers":[],"tlsProtocols":[],"memoryLimitBytes":67108864,"proxyServiceUrl":null,"proxyProtocol":null,"enableTransaction":false,"dnsLookupBindAddress":null,"dnsLookupBindPort":0,"socks5ProxyAddress":null,"socks5ProxyUsername":null,"socks5ProxyPassword":null}
2022-09-23T11:18:36,710 - INFO  - [metadata-store-12-1:NamespaceBundleFactory@186] - Policy updated for namespace Optional[my-tenant/my-ns], refreshing the bundle cache.
2022-09-23T11:18:36,738 - INFO  - [pulsar-4-1:ModularLoadManagerImpl@857] - 1 brokers being considered for assignment of my-tenant/my-ns/0x00000000_0xffffffff
2022-09-23T11:18:36,740 - INFO  - [pulsar-4-1:OwnershipCache@199] - Trying to acquire ownership of my-tenant/my-ns/0x00000000_0xffffffff
2022-09-23T11:18:36,742 - INFO  - [metadata-store-12-1:ResourceLockImpl@166] - Acquired resource lock on /namespace/my-tenant/my-ns/0x00000000_0xffffffff
2022-09-23T11:18:36,742 - INFO  - [metadata-store-12-1:OwnershipCache@205] - Successfully acquired ownership of OwnedBundle(bundle=my-tenant/my-ns/0x00000000_0xffffffff, isActive=1)
2022-09-23T11:18:36,743 - INFO  - [pulsar-4-2:PulsarService@1187] - Loading all topics on bundle: my-tenant/my-ns/0x00000000_0xffffffff
2022-09-23T11:18:36,750 - INFO  - [pulsar-client-io-40-1:ConnectionPool@245] - [[id: 0x399d1371, L:/127.0.0.1:51558 - R:localhost/127.0.0.1:51554]] Connected to server
2022-09-23T11:18:36,750 - INFO  - [pulsar-client-io-40-1:ClientCnx@255] - [id: 0x399d1371, L:/127.0.0.1:51558 - R:localhost/127.0.0.1:51554] Connected through proxy to target broker at localhost:51549
2022-09-23T11:18:36,751 - INFO  - [pulsar-proxy-io-38-7:ProxyConnection@186] - [/127.0.0.1:51558] New connection opened
2022-09-23T11:18:36,762 - INFO  - [pulsar-proxy-io-38-7:ProxyConnection@341] - [/127.0.0.1:51558] complete connection, init proxy handler. authenticated with token role client, hasProxyToBrokerUrl: true
2022-09-23T11:18:36,770 - INFO  - [pulsar-io-6-2:ServerCnx@299] - New connection from /127.0.0.1:51559
2022-09-23T11:18:36,776 - INFO  - [pulsar-client-io-40-1:ProducerImpl@1639] - [persistent://my-tenant/my-ns/my-topic1] [null] Creating producer on cnx [id: 0x399d1371, L:/127.0.0.1:51558 - R:localhost/127.0.0.1:51554]
2022-09-23T11:18:36,836 - INFO  - [pulsar-io-6-2:ManagedLedgerImpl@356] - Opening managed ledger my-tenant/my-ns/persistent/my-topic1
2022-09-23T11:18:36,841 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:MetaStoreImpl@113] - Creating '/managed-ledgers/my-tenant/my-ns/persistent/my-topic1'
2022-09-23T11:18:36,882 - INFO  - [mock-pulsar-bk-OrderedExecutor-0-0:PulsarMockBookKeeper@122] - Creating ledger 3
2022-09-23T11:18:36,914 - INFO  - [mock-pulsar-bk-OrderedExecutor-0-0:ManagedLedgerImpl@505] - [my-tenant/my-ns/persistent/my-topic1] Created ledger 3
2022-09-23T11:18:36,942 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:ManagedLedgerFactoryImpl$2@380] - [my-tenant/my-ns/persistent/my-topic1] Successfully initialize managed ledger
2022-09-23T11:18:36,984 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:BrokerService$2@1502] - Created topic persistent://my-tenant/my-ns/my-topic1 - dedup is disabled
2022-09-23T11:18:37,000 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:ServerCnx@1474] - [/127.0.0.1:51559] Created new producer: Producer{topic=PersistentTopic{topic=persistent://my-tenant/my-ns/my-topic1}, client=/127.0.0.1:51559, producerName=proxy-authorization-0-0, producerId=0}
2022-09-23T11:18:37,006 - INFO  - [pulsar-client-io-40-1:ProducerImpl@1694] - [persistent://my-tenant/my-ns/my-topic1] [proxy-authorization-0-0] Created producer on cnx [id: 0x399d1371, L:/127.0.0.1:51558 - R:localhost/127.0.0.1:51554]
2022-09-23T11:18:47,999 - INFO  - [pulsar-io-6-2:ServerCnx@769] - [/127.0.0.1:51559] Refreshing authentication credentials for originalPrincipal client and authRole Proxy
2022-09-23T11:18:47,999 - INFO  - [pulsar-io-6-1:ServerCnx@769] - [/127.0.0.1:51557] Refreshing authentication credentials for originalPrincipal client and authRole Proxy
2022-09-23T11:18:48,008 - INFO  - [pulsar-io-6-2:ServerCnx@730] - [/127.0.0.1:51559] Refreshed authentication credentials for role client
2022-09-23T11:18:48,010 - INFO  - [pulsar-io-6-1:ServerCnx@730] - [/127.0.0.1:51557] Refreshed authentication credentials for role client

Old version:

2022-09-23T11:17:45,889 - INFO  - [pulsar-client-io-40-1:ConnectionPool@245] - [[id: 0x155081f5, L:/127.0.0.1:51513 - R:localhost/127.0.0.1:51508]] Connected to server
2022-09-23T11:17:45,890 - INFO  - [pulsar-client-io-40-1:ClientCnx@255] - [id: 0x155081f5, L:/127.0.0.1:51513 - R:localhost/127.0.0.1:51508] Connected through proxy to target broker at localhost:51503
2022-09-23T11:17:45,891 - INFO  - [pulsar-proxy-io-38-7:ProxyConnection@186] - [/127.0.0.1:51513] New connection opened
2022-09-23T11:17:45,902 - INFO  - [pulsar-proxy-io-38-7:ProxyConnection@341] - [/127.0.0.1:51513] complete connection, init proxy handler. authenticated with token role client, hasProxyToBrokerUrl: true
2022-09-23T11:17:45,911 - INFO  - [pulsar-io-6-2:ServerCnx@299] - New connection from /127.0.0.1:51514
2022-09-23T11:17:45,917 - INFO  - [pulsar-client-io-40-1:ProducerImpl@1639] - [persistent://my-tenant/my-ns/my-topic1] [null] Creating producer on cnx [id: 0x155081f5, L:/127.0.0.1:51513 - R:localhost/127.0.0.1:51508]
2022-09-23T11:17:45,993 - INFO  - [pulsar-io-6-2:ManagedLedgerImpl@356] - Opening managed ledger my-tenant/my-ns/persistent/my-topic1
2022-09-23T11:17:45,996 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:MetaStoreImpl@113] - Creating '/managed-ledgers/my-tenant/my-ns/persistent/my-topic1'
2022-09-23T11:17:46,038 - INFO  - [mock-pulsar-bk-OrderedExecutor-0-0:PulsarMockBookKeeper@122] - Creating ledger 3
2022-09-23T11:17:46,077 - INFO  - [mock-pulsar-bk-OrderedExecutor-0-0:ManagedLedgerImpl@505] - [my-tenant/my-ns/persistent/my-topic1] Created ledger 3
2022-09-23T11:17:46,111 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:ManagedLedgerFactoryImpl$2@380] - [my-tenant/my-ns/persistent/my-topic1] Successfully initialize managed ledger
2022-09-23T11:17:46,160 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:BrokerService$2@1502] - Created topic persistent://my-tenant/my-ns/my-topic1 - dedup is disabled
2022-09-23T11:17:46,177 - INFO  - [bookkeeper-ml-scheduler-OrderedScheduler-4-0:ServerCnx@1474] - [/127.0.0.1:51514] Created new producer: Producer{topic=PersistentTopic{topic=persistent://my-tenant/my-ns/my-topic1}, client=/127.0.0.1:51514, producerName=proxy-authorization-0-0, producerId=0}
2022-09-23T11:17:46,183 - INFO  - [pulsar-client-io-40-1:ProducerImpl@1694] - [persistent://my-tenant/my-ns/my-topic1] [proxy-authorization-0-0] Created producer on cnx [id: 0x155081f5, L:/127.0.0.1:51513 - R:localhost/127.0.0.1:51508]
2022-09-23T11:17:56,653 - INFO  - [pulsar-io-6-1:ServerCnx@769] - [/127.0.0.1:51511] Refreshing authentication credentials for originalPrincipal client and authRole Proxy
2022-09-23T11:17:56,653 - INFO  - [pulsar-io-6-2:ServerCnx@769] - [/127.0.0.1:51514] Refreshing authentication credentials for originalPrincipal client and authRole Proxy
2022-09-23T11:17:56,660 - WARN  - [pulsar-io-6-1:ServerCnx@726] - [/127.0.0.1:51511] Principal cannot change during an authentication refresh expected=client got=Proxy
2022-09-23T11:17:56,661 - INFO  - [pulsar-io-6-2:ServerCnx@730] - [/127.0.0.1:51514] Refreshed authentication credentials for role client
2022-09-23T11:17:56,661 - INFO  - [pulsar-io-6-1:ServerCnx@311] - Closed connection from /127.0.0.1:51511
2022-09-23T11:17:56,661 - INFO  - [pulsar-proxy-io-38-5:ClientCnx@292] - [id: 0x23766f8b, L:/127.0.0.1:51511 ! R:localhost/127.0.0.1:51503] Disconnected
2022-09-23T11:18:02,580 - INFO  - [pulsar-proxy-io-38-10:ConnectionPool@245] - [[id: 0xe8cbf738, L:/127.0.0.1:51523 - R:localhost/127.0.0.1:51503]] Connected to server
2022-09-23T11:18:02,582 - INFO  - [pulsar-io-6-4:ServerCnx@299] - New connection from /127.0.0.1:51523
2022-09-23T11:18:02,588 - INFO  - [pulsar-io-6-4:ServerCnx@3082] - [/127.0.0.1:51523] Failed to authenticate: operation=connect, principal=Proxy, reason=Failed to authentication token: JWT expired at 2022-09-23T03:17:55Z. Current time: 2022-09-23T03:18:02Z, a difference of 7586 milliseconds.  Allowed clock skew: 0 milliseconds.
2022-09-23T11:18:02,590 - INFO  - [pulsar-io-6-4:ServerCnx@311] - Closed connection from /127.0.0.1:51523
2022-09-23T11:18:02,590 - WARN  - [pulsar-proxy-io-38-10:ClientCnx@732] - [id: 0xe8cbf738, L:/127.0.0.1:51523 - R:localhost/127.0.0.1:51503] Received error from server: Unable to authenticate
2022-09-23T11:18:02,590 - WARN  - [pulsar-proxy-io-38-10:ConnectionPool@282] - [[id: 0xe8cbf738, L:/127.0.0.1:51523 - R:localhost/127.0.0.1:51503]] Connection handshake failed: org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate
2022-09-23T11:18:02,591 - ERROR - [pulsar-client-io-40-1:ClientCnx@1138] - [id: 0xb8d2d6d6, L:/127.0.0.1:51510 - R:localhost/127.0.0.1:51508] Close connection because received internal-server error {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135601, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
2022-09-23T11:18:02,591 - ERROR - [pulsar-proxy-io-38-10:ClientCnx@741] - [id: 0xe8cbf738, L:/127.0.0.1:51523 ! R:localhost/127.0.0.1:51503] Failed to authenticate the client
2022-09-23T11:18:02,591 - WARN  - [pulsar-proxy-io-38-10:ClientCnx@753] - [id: 0xe8cbf738, L:/127.0.0.1:51523 ! R:localhost/127.0.0.1:51503] Received unknown request id from server: -1
2022-09-23T11:18:02,592 - INFO  - [pulsar-proxy-io-38-10:ClientCnx@292] - [id: 0xe8cbf738, L:/127.0.0.1:51523 ! R:localhost/127.0.0.1:51503] Disconnected
2022-09-23T11:18:02,592 - INFO  - [pulsar-proxy-io-38-2:ProxyConnection@199] - [/127.0.0.1:51510] Connection closed
2022-09-23T11:18:02,592 - WARN  - [pulsar-client-io-40-1:BinaryProtoLookupService@198] - [persistent://my-tenant/my-ns/my-topic1] failed to get Partitioned metadata : {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135601, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
org.apache.pulsar.client.api.PulsarClientException$LookupException: {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135601, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1224) ~[classes/:?]
	at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:640) ~[classes/:?]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:134) ~[classes/:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:314) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:435) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2022-09-23T11:18:02,593 - ERROR - [pulsar-client-io-40-1:ClientCnx@1138] - [id: 0xb8d2d6d6, L:/127.0.0.1:51510 ! R:localhost/127.0.0.1:51508] Close connection because received internal-server error {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135600, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
2022-09-23T11:18:02,593 - WARN  - [pulsar-client-io-40-1:BinaryProtoLookupService@198] - [persistent://my-tenant/my-ns/my-topic1] failed to get Partitioned metadata : {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135600, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
org.apache.pulsar.client.api.PulsarClientException$LookupException: {"errorMsg":"org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate","reqId":676244056976135600, "remote":"localhost/127.0.0.1:51508", "local":"/127.0.0.1:51510"}
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1224) ~[classes/:?]
	at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:640) ~[classes/:?]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:134) ~[classes/:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2022-09-23T11:18:02,594 - INFO  - [pulsar-client-io-40-1:ClientCnx@292] - [id: 0xb8d2d6d6, L:/127.0.0.1:51510 ! R:localhost/127.0.0.1:51508] Disconnected
2022-09-23T11:18:02,594 - WARN  - [pulsar-client-io-40-1:BinaryProtoLookupService@198] - [persistent://my-tenant/my-ns/my-topic1] failed to get Partitioned metadata : Disconnected from server at localhost/127.0.0.1:51508
org.apache.pulsar.client.api.PulsarClientException$ConnectException: Disconnected from server at localhost/127.0.0.1:51508
	at org.apache.pulsar.client.impl.ClientCnx.channelInactive(ClientCnx.java:298) ~[classes/:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:357) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

Solution:

We need to do forwarding based on the proxy. When the proxy's client receives the refresh authentication data request from the broker, the proxy's client sends the same request to the user's client, the user's client responds new authentication to the proxy, and the proxy sends the new authentication to all client.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

Matching PR in forked repository

PR in forked repository: nodece#5

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 23, 2022
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 23, 2022
@nodece nodece requested a review from tuteng September 23, 2022 18:47
@tuteng tuteng requested a review from merlimat September 24, 2022 00:11
@nodece nodece force-pushed the fix-refresh-client-auth branch 3 times, most recently from de3ab59 to c2a83b8 Compare September 27, 2022 10:43
@nodece
Copy link
Member Author

nodece commented Sep 27, 2022

@lin-zhao @codelipenghui Thank you all! These changes have been fixed, please help to review.

@nodece nodece requested review from codelipenghui and lin-zhao and removed request for lin-zhao September 27, 2022 10:44
@nodece nodece force-pushed the fix-refresh-client-auth branch 2 times, most recently from eb1de4b to e5cb029 Compare September 28, 2022 10:55
@codelipenghui
Copy link
Contributor

@nodece Please check the check style issue nodece#5

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
@nodece nodece force-pushed the fix-refresh-client-auth branch from 3ddeb0c to 72654b3 Compare September 29, 2022 05:49
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
+ "the proxy client {}", proxyConnection.ctx().channel(), ctx.channel());
}

proxyConnection.ctx().writeAndFlush(Commands.newAuthChallenge(clientAuthMethod, AuthData.REFRESH_AUTH_DATA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can make some improvements to avoid sending auth challenge command to the client if the proxy just has done the auth challenge. Because all the brokers that interacted with client connection will send the auth challenge command

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it could be improved, but I remember there's only one connection here.

@nodece nodece merged commit c952f3c into apache:master Sep 29, 2022
nodece added a commit that referenced this pull request Sep 29, 2022
* [fix][proxy] Fix refresh client auth

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

* Fix style

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit c952f3c)
nodece added a commit that referenced this pull request Sep 29, 2022
* [fix][proxy] Fix refresh client auth

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

* Fix style

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

(cherry picked from commit c952f3c)
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
nodece added a commit that referenced this pull request Sep 29, 2022
* [fix][proxy] Fix refresh client auth

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

* Fix style

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

(cherry picked from commit c952f3c)
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
@nodece nodece added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Sep 29, 2022
@lhotari
Copy link
Member

lhotari commented Dec 20, 2022

I think that this change should be reverted. This can cause corruption of the traffic between the client and the proxy.
I provided some explanation in this comment: #10816 (comment) . When the proxy is passing bytes back and forth, it is not safe to inject any protocol messages to the connection. That might corrupt the traffic.

@nodece
Copy link
Member Author

nodece commented Jan 12, 2023

I think that this change should be reverted. This can cause corruption of the traffic between the client and the proxy. I provided some explanation in this comment: #10816 (comment) . When the proxy is passing bytes back and forth, it is not safe to inject any protocol messages to the connection. That might corrupt the traffic.

I just fix the ProxyLookupRequests connection, so don't break other connections.

}

try {
AuthData clientData = AuthData.of(authResponse.getResponse().getAuthData());
doAuthentication(clientData);
if (service.getConfiguration().isForwardAuthorizationCredentials()
Copy link
Member

@michaeljmarshall michaeljmarshall Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting consequence of this change is that the ProxyConnection goes back into the Connecting state if the authentication takes multiple steps.

michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
Relates to: #17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed` 

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
Relates to: #17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed`

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21

(cherry picked from commit c8650ce)
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
Relates to: #17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed`

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21

(cherry picked from commit c8650ce)
michaeljmarshall added a commit that referenced this pull request Jan 31, 2023
Relates to: #17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed`

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21

(cherry picked from commit c8650ce)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Feb 28, 2023
Relates to: apache#17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed`

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21

(cherry picked from commit c8650ce)
(cherry picked from commit f9e24f2)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Feb 28, 2023
* [fix][proxy] Fix refresh client auth

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

* Fix style

Signed-off-by: Zixuan Liu <nodeces@gmail.com>

(cherry picked from commit c952f3c)
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit 388938e)
michaeljmarshall added a commit that referenced this pull request Apr 11, 2023
Fixes: #10816
PIP: #19771
Supersedes: #19026
Depends on: #20062

### Motivation

The Pulsar Proxy does not properly handle authentication data refresh when in state `ProxyLookupRequests`. The consequence is described in #10816. Essentially, the problem is that the proxy caches stale authentication data and sends it to the broker leading to connection failures.

#17831 attempted to fix the underlying problem, but it missed an important edge cases. Specifically, it missed the case that the `ConnectionPool` will have multiple connections when a lookup gets redirected. As such, the following problem exists (and is fixed by this PR):

1. Client opens connection to perform lookups.
2. Proxy connects to broker 1 to get the topic ownership info.
3. Time passes.
4. Client does an additional lookup, and this topic is on a newly created broker 2. In this case, the proxy opens a new connection with the stale client auth data.
5. Broker 2 rejects the connection because it fails with expired authentication.

### Modifications

* Remove some of the implementation from #17831. This new implementation still allows a broker to challenge the client through the proxy, but notably, it limits the number of challenges sent to the client. Further, the proxy does not challenge the client when the auth data is not expired.
* Introduce authentication refresh in the proxy so that the proxy challenges the client any time the auth data is expired.
* Update the `ProxyClientCnx` to get the `clientAuthData` from the `ProxyConnection` to ensure that it gets new authentication data.
* Add clock skew to the `AuthenticationProviderToken`. This is necessary to make some of the testing not flaky and it will also be necessary for users to configure in their clusters.

### Verifying this change

The `ProxyRefreshAuthTest` covers the existing behavior and I expanded it to cover the edge case described above.

Additionally, testing this part of the code will be much easier to test once we implement #19624.

### Documentation

- [x] `doc-not-needed`

### Matching PR in forked repository

PR in forked repository: the relevant tests pass locally, so I am going to skip the forked tests.
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this pull request Apr 20, 2023
Fixes: apache#10816
PIP: apache#19771
Supersedes: apache#19026
Depends on: apache#20062

The Pulsar Proxy does not properly handle authentication data refresh when in state `ProxyLookupRequests`. The consequence is described in apache#10816. Essentially, the problem is that the proxy caches stale authentication data and sends it to the broker leading to connection failures.

apache#17831 attempted to fix the underlying problem, but it missed an important edge cases. Specifically, it missed the case that the `ConnectionPool` will have multiple connections when a lookup gets redirected. As such, the following problem exists (and is fixed by this PR):

1. Client opens connection to perform lookups.
2. Proxy connects to broker 1 to get the topic ownership info.
3. Time passes.
4. Client does an additional lookup, and this topic is on a newly created broker 2. In this case, the proxy opens a new connection with the stale client auth data.
5. Broker 2 rejects the connection because it fails with expired authentication.

* Remove some of the implementation from apache#17831. This new implementation still allows a broker to challenge the client through the proxy, but notably, it limits the number of challenges sent to the client. Further, the proxy does not challenge the client when the auth data is not expired.
* Introduce authentication refresh in the proxy so that the proxy challenges the client any time the auth data is expired.
* Update the `ProxyClientCnx` to get the `clientAuthData` from the `ProxyConnection` to ensure that it gets new authentication data.
* Add clock skew to the `AuthenticationProviderToken`. This is necessary to make some of the testing not flaky and it will also be necessary for users to configure in their clusters.

The `ProxyRefreshAuthTest` covers the existing behavior and I expanded it to cover the edge case described above.

Additionally, testing this part of the code will be much easier to test once we implement apache#19624.

- [x] `doc-not-needed`

PR in forked repository: the relevant tests pass locally, so I am going to skip the forked tests.

(cherry picked from commit 075b625)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PulsarProxy does not always refresh authentication tokens received from client
6 participants