Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running kafka (without zookeeper) #80

Closed
shavo007 opened this issue Sep 27, 2021 · 11 comments
Closed

Issue with running kafka (without zookeeper) #80

shavo007 opened this issue Sep 27, 2021 · 11 comments

Comments

@shavo007
Copy link

Description
Error connecting to broker when running kraft

cp-all-in-one/cp-all-in-one-kraft
https://github.com/confluentinc/cp-all-in-one/tree/6.2.0-post/cp-all-in-one-kraft

Troubleshooting
When i run the sample producer i get an exception:

2021-09-22 12:39:35 WARN  NetworkClient:1060 - [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected

I checked the logs and the broker seems up but can't connect to it

The other examples work fine with zookeeper but not this one.

Environment

  • GitHub branch: 6.2.0-post
  • Operating System: mac os
  • Version of Docker: Version: 20.10.8
  • Version of Docker Compose: docker-compose version 1.29.2
@pmeister
Copy link

Same here! I'm trying to connect with kafka-topics:

$ kafka-topics --bootstrap-server localhost:9092 --list
Error while executing topic command : Timed out waiting for a node assignment. Call: listTopics
[2021-10-18 15:38:56,506] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listTopics

@bjrke
Copy link

bjrke commented Nov 4, 2021

Same Issue here tested with the v7.0.0 container
I am able to run kafka-topics via docker exec but my kafka client on the host is not able to connect and produces the same error.
Interestingly the console consumer (kafka-console-consumer.sh) running on my host machine is able to connect, no idea why. Maybe the error is not logged.

@pmeister
Copy link

pmeister commented Nov 4, 2021

@bjrke Can you please clarify what you mean by "the console consume"?

@bjrke
Copy link

bjrke commented Nov 4, 2021

the consoleConsumer provided with kafka itself, sorry for the typo, I will edit my comment

@mbreevoort
Copy link

Does this fix help?
#84

@pmeister
Copy link

@mbreevoort Not for me. Made no difference, unfortunately.

@aesteve
Copy link
Contributor

aesteve commented Jan 20, 2022

Facing the same issue that I first discovered using librdkafka, but the same happens with a Java producer, too.
What's happening is the API version (v3) request gets "cut" before actually receiving a response.

From the broker container: netstat -ano -p

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State	PID/Program name     Timer
tcp        0	  0 0.0.0.0:9101            0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 0.0.0.0:41777           0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 127.0.0.11:46301        0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 127.0.0.1:9092          0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 192.168.48.2:29092      0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 192.168.48.2:29093      0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 192.168.48.2:29092      192.168.48.3:45354      ESTABLISHED -                    keepalive (6592.04/0/0)
tcp        0	  0 192.168.48.2:29092      192.168.48.3:45348      ESTABLISHED -                    keepalive (6591.66/0/0)
tcp        0	  0 192.168.48.2:29092      192.168.48.3:45360      ESTABLISHED -                    keepalive (6592.52/0/0)
tcp        0	  0 192.168.48.2:29093      192.168.48.2:37916      TIME_WAIT   -                    timewait (52.04/0/0)
tcp        0	  0 192.168.48.2:29093      192.168.48.2:37902      ESTABLISHED -                    keepalive (6587.84/0/0)
tcp        0	  0 192.168.48.2:29092      192.168.48.3:45350      ESTABLISHED -                    keepalive (6591.73/0/0)
tcp        0	  0 192.168.48.2:37902      192.168.48.2:29093      ESTABLISHED -                    keepalive (6587.83/0/0)
udp        0	  0 127.0.0.11:32893        0.0.0.0:*                           -                    off (0.00/0/0)

Whereas if running the standard image with zookeeper I'm getting this result:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State	PID/Program name     Timer
tcp        0	  0 0.0.0.0:9092            0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 0.0.0.0:29092           0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 0.0.0.0:34405           0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 0.0.0.0:9101            0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 127.0.0.11:37105        0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 0.0.0.0:8090            0.0.0.0:*               LISTEN	-                    off (0.00/0/0)
tcp        0	  0 192.168.80.3:41014      52.85.187.45:443        TIME_WAIT   -                    timewait (27.25/0/0)
tcp        0	  0 192.168.80.3:50126      192.168.80.3:29092      ESTABLISHED -                    keepalive (7159.45/0/0)
tcp        0	  0 192.168.80.3:50106      192.168.80.3:29092      TIME_WAIT   -                    timewait (18.20/0/0)
tcp        0	  0 192.168.80.3:50138      192.168.80.3:29092      TIME_WAIT   -                    timewait (19.78/0/0)
tcp        0	  0 192.168.80.3:50096      192.168.80.3:29092      ESTABLISHED -                    keepalive (7155.10/0/0)
tcp        0	  0 192.168.80.3:50160      192.168.80.3:29092      TIME_WAIT   -                    timewait (20.39/0/0)
tcp        0	  0 192.168.80.3:29092      192.168.80.4:35756      ESTABLISHED -                    keepalive (7159.82/0/0)
tcp        0	  0 192.168.80.3:50116      192.168.80.3:29092      TIME_WAIT   -                    timewait (18.42/0/0)
tcp        0	  0 192.168.80.3:29092      192.168.80.3:50090      ESTABLISHED -                    keepalive (7156.31/0/0)
tcp        0	  0 192.168.80.3:29092      192.168.80.3:50204      ESTABLISHED -                    keepalive (7170.32/0/0)
tcp        0	  0 192.168.80.3:50098      192.168.80.3:29092      TIME_WAIT   -                    timewait (17.51/0/0)
tcp        0	  0 192.168.80.3:41016      52.85.187.45:443        TIME_WAIT   -                    timewait (27.37/0/0)
tcp        0	  0 192.168.80.3:29092      192.168.80.3:50096      ESTABLISHED -                    keepalive (7156.32/0/0)
tcp        0	  0 192.168.80.3:29092      192.168.80.3:50126      ESTABLISHED -                    keepalive (7159.45/0/0)
tcp        0	  0 192.168.80.3:50164      192.168.80.3:29092      TIME_WAIT   -                    timewait (20.53/0/0)
tcp        0	  0 192.168.80.3:50112      192.168.80.3:29092      TIME_WAIT   -                    timewait (18.34/0/0)
tcp        0	  0 192.168.80.3:50158      192.168.80.3:29092      TIME_WAIT   -                    timewait (20.39/0/0)

What gets my attention is:

tcp        0	  0 127.0.0.1:9092          0.0.0.0:*               LISTEN	-                    off (0.00/0/0)

vs.

tcp        0	  0 0.0.0.0:9092            0.0.0.0:*               LISTEN	-                    off (0.00/0/0)

I'm guessing that could be part of the problem (I remember having troubles when running HTTP server in docker for instance, and having to use 0.0.0.0 as listen host).

So I tried and changed my config to:

  broker:
    image: confluentinc/cp-kafka:7.0.1
    hostname: broker
    container_name: broker
    ports:
      - "9092:9092"
      - "9101:9101"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092'
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_JMX_PORT: 9101
      KAFKA_JMX_HOSTNAME: localhost
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker:29093'
      KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
      KAFKA_LOG4J_LOGGERS: "kafka.controller=TRACE,kafka.server=TRACE,kafka.broker=TRACE,kafka.server.IncrementalFetchContext=WARN"
    volumes:
      - ./update_run.sh:/tmp/update_run.sh
    command: "bash -c 'if [ ! -f /tmp/update_run.sh ]; then echo \"ERROR: Did you forget the update_run.sh file that came with this docker-compose.yml file?\" && exit 1 ; else /tmp/update_run.sh && /etc/confluent/docker/run ; fi'"

the change being:

      KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092'

And it looks ok so far, need to experiment more and make sure everything is still working both from the inside of docker compose and from the outside.
For now:

  • topic creation is OK (from a kafka-setup container inside docker-compose)
  • a producer (from the outside, local machine) can connect using localhost:9092 and produce messages
  • control center shows everything properly, including the messages sent by the producer
  • connect instance looks ok so far
  • ksqldb still to test

Hopefully this helps. Can file a PR if this is indeed the appropriate solution.

@ybyzek
Copy link
Contributor

ybyzek commented Jan 20, 2022

@aesteve thank you for tracking down the issue. I have verified in my environment that kafka-topics --bootstrap-server localhost:9092 --list fails with the current config and works with the proposed changed. If you could please file a PR, that would be excellent! Note: please base/merge on 6.2.0-post (not latest 7.0.1-post) since the problem exists there. Once PR is merged, I'll propagate the fix to all recent branches.

@ybyzek
Copy link
Contributor

ybyzek commented Jan 20, 2022

@pmeister
Copy link

Thanks much for your fix, @aesteve, and for the quick incorporation of that fix, @ybyzek.

@bjrke
Copy link

bjrke commented Jan 24, 2022

its working! thx @aesteve and @ybyzek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants