Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER #932

Open
nflaig opened this issue Dec 14, 2021 · 4 comments
Open
Labels
type/bug Something isn't working

Comments

@nflaig
Copy link

nflaig commented Dec 14, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

We are running Kafka and Zookeeper inside the Consul service mesh and sometimes the connection from Kafka to Zookeeper seems to fail. This issue seems to be specific to stateful sets as I also noticed a similar issue when connection to Redis. Also another strange behavior maybe related to this is that the headless service has to be used as the host instead of the normal service. Using the normal service works sometimes but fails more often then the headless service.

Kafka is dialing Zookeeper directly through the headless service so I have configured ServiceDefaults to allow direct connections.

Another maybe interesting fact is that in the Consul UI Topology view, Zookeeper is not shown as an upstream for Kafka. This always seems to be the case if the connection also does not work so it could potentially be related.

Configuration

Consul helm values
global:
  domain: consul
  name: consul

  gossipEncryption:
    autoGenerate: true

  tls:
    enabled: true
    enableAutoEncrypt: true

  acls:
    manageSystemACLs: true
  
server:
  replicas: 3
  bootstrapExpect: 3
  extraConfig: |
    {
      "ui_config": {
        "content_path": "/consul"
      }
    }

ui:
  enabled: true
  service:
    type: "ClusterIP"

client:
  tolerations: |
    - operator: Exists
  extraConfig: |
    {"advertise_reconnect_timeout": "15m"}

connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: true

controller:
  enabled: true
Kafka config related to Consul
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: kafka
  labels:
    app: kafka
spec:
  protocol: tcp
  transparentProxy:
    dialedDirectly: true
---
podAnnotations:
  consul.hashicorp.com/connect-inject: "true"
  consul.hashicorp.com/connect-service: "kafka"
  consul.hashicorp.com/connect-service-port: "9092"
  consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "5556,9093"
  consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: "9093"
  consul.hashicorp.com/service-tags: "kafka"
Zookeeper config related to Consul
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: zookeeper
  labels:
    app: zookeeper
spec:
  protocol: tcp
  transparentProxy:
    dialedDirectly: true
---
podAnnotations:
  consul.hashicorp.com/connect-inject: "true"
  consul.hashicorp.com/connect-service: "zookeeper"
  consul.hashicorp.com/connect-service-port: "2181"
  consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "2888,3888,9141"
  consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: "2888,3888"
  consul.hashicorp.com/service-tags: "zookeeper"

Logs

Zookeeper Envoy logs
[2021-12-14 06:53:08.770][13][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C787] new tcp proxy session
[2021-12-14 06:53:08.770][13][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C787] Creating connection to cluster local_app
[2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection
[2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C788] connecting
[2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:860] [C788] connecting to 127.0.0.1:2181
[2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:876] [C788] connection in progress
[2021-12-14 06:53:08.770][13][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections
[2021-12-14 06:53:08.770][13][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C787] new connection
[2021-12-14 06:53:08.770][13][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:219] [C787] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-12-14 06:53:08.770][13][debug][connection] [source/common/network/connection_impl.cc:243] [C787] closing socket: 0
[2021-12-14 06:53:08.770][13][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:219] [C787] TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request
[2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection
[2021-12-14 06:53:08.771][13][debug][connection] [source/common/network/connection_impl.cc:133] [C788] closing data_to_write=0 type=1
[2021-12-14 06:53:08.771][13][debug][connection] [source/common/network/connection_impl.cc:243] [C788] closing socket: 1
[2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C788] client disconnected
[2021-12-14 06:53:08.771][13][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C787] adding to cleanup list
[2021-12-14 06:53:08.771][13][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C788] connection destroyed
Zookeeper logs
2021-12-14 07:10:30,079 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55294
2021-12-14 07:10:39,204 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55404
2021-12-14 07:10:40,076 [myid:1] - INFO  [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55420
2021-12-14 07:10:49,196 [myid:1] - INFO  [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55546
2021-12-14 07:10:50,083 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55564
2021-12-14 07:10:59,191 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55686
2021-12-14 07:11:00,136 [myid:1] - INFO  [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55706
2021-12-14 07:11:09,189 [myid:1] - INFO  [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55828
2021-12-14 07:11:10,085 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55844
2021-12-14 07:11:19,203 [myid:1] - INFO  [nioEventLoopGroup-4-1:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55960
2021-12-14 07:11:20,083 [myid:1] - INFO  [nioEventLoopGroup-4-2:NettyServerCnxn@294] - Processing ruok command from /127.0.0.1:55978
Kafka Envoy logs
[2021-12-14 07:36:48.106][18][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C1003] new tcp proxy session
[2021-12-14 07:36:48.106][18][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C1003] Creating connection to cluster local_app
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C1004] connecting
[2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:860] [C1004] connecting to 127.0.0.1:9092
[2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:876] [C1004] connection in progress
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections
[2021-12-14 07:36:48.106][18][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C1003] new connection
[2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:243] [C1003] closing socket: 0
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection
[2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:133] [C1004] closing data_to_write=0 type=1
[2021-12-14 07:36:48.106][18][debug][connection] [source/common/network/connection_impl.cc:243] [C1004] closing socket: 1
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C1004] client disconnected
[2021-12-14 07:36:48.106][18][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C1003] adding to cleanup list
[2021-12-14 07:36:48.106][18][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C1004] connection destroyed
[2021-12-14 07:36:52.253][1][debug][main] [source/server/server.cc:209] flushing stats
[2021-12-14 07:36:57.254][1][debug][main] [source/server/server.cc:209] flushing stats
[2021-12-14 07:36:58.107][16][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:249] [C1005] new tcp proxy session
[2021-12-14 07:36:58.107][16][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:397] [C1005] Creating connection to cluster local_app
[2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:98] creating a new connection
[2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:383] [C1006] connecting
[2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:860] [C1006] connecting to 127.0.0.1:9092
[2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:876] [C1006] connection in progress
[2021-12-14 07:36:58.107][16][debug][pool] [source/common/tcp/original_conn_pool.cc:125] queueing request due to no available connections
[2021-12-14 07:36:58.107][16][debug][conn_handler] [source/server/active_tcp_listener.cc:328] [C1005] new connection
[2021-12-14 07:36:58.107][16][debug][connection] [source/common/network/connection_impl.cc:243] [C1005] closing socket: 0
[2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:223] canceling pending request
[2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:231] canceling pending connection
[2021-12-14 07:36:58.108][16][debug][connection] [source/common/network/connection_impl.cc:133] [C1006] closing data_to_write=0 type=1
[2021-12-14 07:36:58.108][16][debug][connection] [source/common/network/connection_impl.cc:243] [C1006] closing socket: 1
[2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:140] [C1006] client disconnected
[2021-12-14 07:36:58.108][16][debug][conn_handler] [source/server/active_tcp_listener.cc:76] [C1005] adding to cleanup list
[2021-12-14 07:36:58.108][16][debug][pool] [source/common/tcp/original_conn_pool.cc:255] [C1006] connection destroyed
Kafka logs
[2021-12-14 07:08:59,224] INFO Initiating client connection, connectString=zookeeper-headless:2181 sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@2d1dee39 (org.apache.zookeeper.ZooKeeper)
[2021-12-14 07:08:59,228] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-12-14 07:08:59,233] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:08:59,235] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:08:59,250] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:08:59,255] INFO Socket error occurred: zookeeper-headless/172.23.4.112:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:00,363] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:00,364] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:01,464] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:01,465] INFO Socket error occurred: zookeeper-headless/172.23.4.112:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:02,566] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:02,567] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:03,668] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:03,669] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:04,769] INFO Opening socket connection to server zookeeper-headless/172.23.5.25:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:04,770] INFO Socket error occurred: zookeeper-headless/172.23.5.25:2181: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,236] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:09:05,871] INFO Opening socket connection to server zookeeper-headless/172.23.4.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,976] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-12-14 07:09:05,977] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-12-14 07:09:05,978] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-12-14 07:09:05,981] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
        at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:271)
        at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:267)
        at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:125)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1948)
        at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:431)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:456)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:191)
        at kafka.Kafka$.main(Kafka.scala:109)
        at kafka.Kafka.main(Kafka.scala)
[2021-12-14 07:09:05,983] INFO shutting down (kafka.server.KafkaServer)
[2021-12-14 07:09:05,991] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2021-12-14 07:09:05,992] INFO shut down completed (kafka.server.KafkaServer)
[2021-12-14 07:09:05,992] ERROR Exiting Kafka. (kafka.Kafka$)
[2021-12-14 07:09:05,998] INFO shutting down (kafka.server.KafkaServer)

Environment details

  • consul-k8s-control-plane v0.37.0
  • consul-helm v0.37.0
  • consul v1.10.4
  • envoy v1.18.4
  • kubernetes v1.20.7

Related

@nflaig nflaig added the type/bug Something isn't working label Dec 14, 2021
rrondeau pushed a commit to rrondeau/consul-k8s that referenced this issue Dec 21, 2021
@david-yu
Copy link
Contributor

david-yu commented Mar 7, 2022

Hi @nflaig this seems to be more involved than it seems to deploy Kafka on top of a Service Mesh. Have you seen this pattern deployed successfully elsewhere?

@nflaig
Copy link
Author

nflaig commented Mar 7, 2022

Hi @david-yu, I've seen a similar issue when scraping services with Prometheus within the service mesh. The similarity here is that in both cases the services are dialed directly so maybe the issue is related to that.

Have you seen this pattern deployed successfully elsewhere?

I would expect that to be a common thing to be honest but I think it is not really about kafka it seems to be a general issue with dialed directly and stateful sets.

@david-yu
Copy link
Contributor

Related issue with Kafka on Consul K8s: hashicorp/consul#14125 also it is recommended to set MaxInboundConnections to a higher number than defaults which should be enabled by Consul 1.13.2 and #1437 when it is released.

@bradydean
Copy link

I think I'm running into the same issue with services deployed by Nomad. I can't get a simple tcp echo server to work.

envoyproxy/envoy#23155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants