Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

Closed
7 tasks
sheeysong opened this issue Oct 2, 2017 · 15 comments
Closed
7 tasks
Labels

Comments

@sheeysong
Copy link

Description

I have kafak 0.10.2.0 server, and having setting: message.max.bytes=1500012 (1.5MB), and I have Go client: librdkafka-dev_0.11.0 0n Ubuntu 14.04 (Debian package). Go version 1.9.0
I got two kinds of wired error on Go client side:
%3|1506735207.752|FAIL|rdkafka#consumer-6| [thrd:ps6655.prn.parsec.abc.com:9092/2]: ps6655.prn.parsec.abc.com:9092/2: Receive failed: Invalid response size 1000000040 (0..1000000000): increase receive.message.max.bytes

ERROR|rdkafka#consumer-6| Receive failed: Invalid response size -2147483555 (0..1000000000):

Why response size over my 1.5MB server allowed received thresold ?
Why response size is negative number????
Thanks,
~Jing

How to reproduce

run consumer_channel_example.go with multiple groutines.
Checklist

Please provide the following information:

  • confluent-kafka-go and librdkafka version (LibraryVersion()):
    librdkafka-dev_0.11.0
  • Apache Kafka broker version:
    0.10.2.0
  • Client configuration: ConfigMap{...}
    "consumers": {
    "consumerProfiles": [
    {
    "bootStrapServers": "kafka-zkb0001.lab.parsec.abc.com:9092,kafka-zkb0002.lab.parsec.abc.com:9092,kafka-zkb0003.lab.parsec.abc.com:9092",
    "enableAutoCommit": false,
    "autoOffsetReset": "earliest",
    "topic": "activity",
    "consumerGroupID": "JetTestConsumergroup",
    "go.application.rebalance.enable": true,
    "go.events.channel.enable": true,
    "go.events.channel.size": 100000
    }
    ]
    }
  • Operating system:
    Ubuntu 14.04
  • Provide client logs (with "debug": ".." as necessary)
  • Provide broker log excerpts
  • Critical issue
@sheeysong
Copy link
Author

Here is complete configmap, left side is the override value if has, otherwise use default:
const ( // common kafka config across Pub & Sub
BOOTSTRAP_SERVERS_CONFIG string = "bootstrap.servers"
CLIENT_ID_CONFIG string = "client.id"
RECEIVE_BUFFER_CONFIG string = "socket.receive.buffer.bytes"
SEND_BUFFER_CONFIG string = "socket.send.buffer.bytes"
SOCKET_TIMEOUT_MS string = "socket.timeout.ms"
SOCKET_KEEPALIVE_ENABLE string = "socket.keepalive.enable"
SESSION_TIMEOUT_MS string = "session.timeout.ms" // ===> 30,000
MAX_TRANSMIT_MESSAGE_SIZE string = "message.max.bytes" //===> 100,000,000
HEARTBEAT_INTERVAL_MS string = "heartbeat.interval.ms" //===> 300
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION string = "max.in.flight" //default:1,000,000
)
const ( // consumer kafka config
FETCH_MIN_BYTES_CONFIG string = "fetch.min.bytes" //===> 1
FETCH_MAX_BYTES_CONFIG string = "fetch.message.max.bytes" //===> 1,000,000,000
RECEIVE_MAX_BYTE_CONFIG string = "receive.message.max.bytes" //===> 1,000,000,000
ENABLE_AUTO_COMMIT_CONFIG string = "enable.auto.commit" //deafult to false
AUTO_OFFSET_RESET_CONFIG string = "auto.offset.reset" //default to latest
GROUP_ID_CONFIG string = "group.id"
GoEventChannelSize string = "go.events.channel.size" // ===>100000
GoEventChannelEnable string = "go.events.channel.enable" // true
GoRebalanceEnable string = "go.application.rebalance.enable" // true
)

@edenhill
Copy link
Contributor

edenhill commented Oct 3, 2017

Do note that message.max.bytes sets the maximum individual size, but messages are typically produced and consumed in batches.
On the consumer side you set the maximum batch size to fetch by fetch.message.max.bytes which you have set to 1GB. Since there is some overhead (40 bytes as seen in your log) you will need to set receive.message.max.bytes somewhat higher than fetch.message.max.bytes. This is what the log message tells you to do:

%3|1506735207.752|FAIL|rdkafka#consumer-6| [thrd:ps6655.prn.parsec.abc.com:9092/2]: ps6655.prn.parsec.abc.com:9092/2: Receive failed: Invalid response size 1000000040 (0..1000000000): increase receive.message.max.bytes

Response sizes are sent in a signed 32 bit integer, when it becomes negative it means it overflowed (which I would say is a broker issue):

ERROR|rdkafka#consumer-6| Receive failed: Invalid response size -2147483555 (0..1000000000):

There is not really any reason to have such a high fetch.message.max.bytes (1GB), try setting it to at most 10-100MB or so.

@sheeysong
Copy link
Author

Thank you for the prompt response!
It's quite strange indeed how we got into this weird error, after cleaned up logs triggered by load testing, restarted zookeeper/brokers, it went smooth. I will keep eyes open on this.
Thanks,
~Jing

@sheeysong
Copy link
Author

Hi @edenhill
I am back to my own issue again. When we under the load testing, this issue surfaced up.
Per confluentinc/librdkafka#1616, it seems fixed the problem for the user, so we set fetch.message.max.bytes=20000000, receive.message.max.bytes=40000000, and message.max.bytes=10000000. However, the error came back with:
%3|1518481856.563|FAIL|rdkafka#consumer-7| [thrd:ps6576.prn.parsec.apple.com:9092/1]: ps6576.prn.parsec.apple.com:9092/1: Receive failed: Invalid response size 40003026 (0..40000000): increase receive.message.max.bytes

How it seems, whatever receive.message.max.bytes we set, it always has some overhead more than what we config. One of our Topic we use key to carry as the MessageHeader as your current version haven't support header yet.
Your thoughts?

My Java client didn't threw any exception when consumed the same cluster. (Maybe my messages have all TTL)

@sheeysong
Copy link
Author

Hi @edenhill -- Any updates?
Thanks

@edenhill
Copy link
Contributor

I've tried to reproduce this to no avail.
Do you know the typical message size for that topic?
Can you share the producer config?

@sheeysong
Copy link
Author

This is our Producer config, we use default for all the rest of field if no override:
go.delivery.reports:true
log.connection.close:false
max.in.flight:20
linger.ms:0
heartbeat.interval.ms:300
compression.codec:none
request.timeout.ms:10000
reconnect.backoff.jitter.ms:500
socket.keepalive.enable:true
queue.buffering.max.kbytes:204800
client.id:FedProducer
retries:1
go.batch.producer:false
go.produce.channel.size:1000
acks:1
batch.num.messages:10000
socket.timeout.ms:60000

@edenhill
Copy link
Contributor

Do you know the typical message size for that topic?

@sheeysong
Copy link
Author

from 1KB up to 2-3MB (various Topics)

@edenhill
Copy link
Contributor

Okay, so I think I know what the issue is.
fetch.message.max.bytes changed from meaning max number of bytes to fetch in total per request to max number of bytes to fetch per partition per request.
So if you are fetching 10 partitions, with a fetch.message.max.bytes of 1M, and your receive.message.max.bytes is 5M, it is possible that the returned response is 1M*10, thus > 5M.

The workaround for now is to set fetch.message.max.bytes to a reasonable value given the number of partitions you consume, and then making sure that receive.message.max.bytes is at least fetch.message.max.bytes*numPartitions + 5%

@sheeysong
Copy link
Author

I see, thank you so much for the prompt reply, let me try it!

@sheeysong
Copy link
Author

@edenhill -- we are testing the config changes per your suggestion.

  1. Eventually, we will upgrade librdkafka-dev to librdkafka1, We did "go get -u github.com/confluentinc/confluent-kafka-go/kafka" half year back,
    I am wondering: if half year ago KafkagoClient still backward compatible with librdkafka1?

  2. To achieve consumption faster, we chose channel-based over function-based. So, if we choose function-based, will we encounter this problem? On the JavaClient side, we use consumer.poll() with max 100 messages per poll(), didn't have this issue.

Thanks!

@sheeysong
Copy link
Author

screen shot 2018-02-22 at 3 53 55 pm

Hi @edenhill ,
We had better run after reset receive.max.message.byte and fetch.message.max.bytes. Thanks for the help! After a day load on the cluster, the kafka broker host shows out bound to consumer side about 800MB on our 10Gb/s network. I have attached the network traffic and memory chat for you. Why the broker host have such high memory footprint? no free memory on the host
run: free -g
total used free shared buffers cached
Mem: 377 376 0 0 0 323
-/+ buffers/cache: 53 324

@edenhill
Copy link
Contributor

Fixed on librdkafka master

@mikesneider
Copy link

screen shot 2018-02-22 at 3 53 55 pm

Hi @edenhill ,
We had better run after reset receive.max.message.byte and fetch.message.max.bytes. Thanks for the help! After a day load on the cluster, the kafka broker host shows out bound to consumer side about 800MB on our 10Gb/s network. I have attached the network traffic and memory chat for you. Why the broker host have such high memory footprint? no free memory on the host
run: free -g
total used free shared buffers cached
Mem: 377 376 0 0 0 323
-/+ buffers/cache: 53 324

great charts, how did you create them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants