rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

sheeysong · 2017-10-02T06:10:16Z

Description

I have kafak 0.10.2.0 server, and having setting: message.max.bytes=1500012 (1.5MB), and I have Go client: librdkafka-dev_0.11.0 0n Ubuntu 14.04 (Debian package). Go version 1.9.0
I got two kinds of wired error on Go client side:
%3|1506735207.752|FAIL|rdkafka#consumer-6| [thrd:ps6655.prn.parsec.abc.com:9092/2]: ps6655.prn.parsec.abc.com:9092/2: Receive failed: Invalid response size 1000000040 (0..1000000000): increase receive.message.max.bytes

ERROR|rdkafka#consumer-6| Receive failed: Invalid response size -2147483555 (0..1000000000):

Why response size over my 1.5MB server allowed received thresold ?
Why response size is negative number????
Thanks,
~Jing

How to reproduce

run consumer_channel_example.go with multiple groutines.
Checklist

Please provide the following information:

confluent-kafka-go and librdkafka version (LibraryVersion()):
librdkafka-dev_0.11.0
Apache Kafka broker version:
0.10.2.0
Client configuration: ConfigMap{...}
"consumers": {
"consumerProfiles": [
{
"bootStrapServers": "kafka-zkb0001.lab.parsec.abc.com:9092,kafka-zkb0002.lab.parsec.abc.com:9092,kafka-zkb0003.lab.parsec.abc.com:9092",
"enableAutoCommit": false,
"autoOffsetReset": "earliest",
"topic": "activity",
"consumerGroupID": "JetTestConsumergroup",
"go.application.rebalance.enable": true,
"go.events.channel.enable": true,
"go.events.channel.size": 100000
}
]
}
Operating system:
Ubuntu 14.04
Provide client logs (with "debug": ".." as necessary)
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

sheeysong · 2017-10-02T06:40:02Z

Here is complete configmap, left side is the override value if has, otherwise use default:
const ( // common kafka config across Pub & Sub
BOOTSTRAP_SERVERS_CONFIG string = "bootstrap.servers"
CLIENT_ID_CONFIG string = "client.id"
RECEIVE_BUFFER_CONFIG string = "socket.receive.buffer.bytes"
SEND_BUFFER_CONFIG string = "socket.send.buffer.bytes"
SOCKET_TIMEOUT_MS string = "socket.timeout.ms"
SOCKET_KEEPALIVE_ENABLE string = "socket.keepalive.enable"
SESSION_TIMEOUT_MS string = "session.timeout.ms" // ===> 30,000
MAX_TRANSMIT_MESSAGE_SIZE string = "message.max.bytes" //===> 100,000,000
HEARTBEAT_INTERVAL_MS string = "heartbeat.interval.ms" //===> 300
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION string = "max.in.flight" //default:1,000,000
)
const ( // consumer kafka config
FETCH_MIN_BYTES_CONFIG string = "fetch.min.bytes" //===> 1
FETCH_MAX_BYTES_CONFIG string = "fetch.message.max.bytes" //===> 1,000,000,000
RECEIVE_MAX_BYTE_CONFIG string = "receive.message.max.bytes" //===> 1,000,000,000
ENABLE_AUTO_COMMIT_CONFIG string = "enable.auto.commit" //deafult to false
AUTO_OFFSET_RESET_CONFIG string = "auto.offset.reset" //default to latest
GROUP_ID_CONFIG string = "group.id"
GoEventChannelSize string = "go.events.channel.size" // ===>100000
GoEventChannelEnable string = "go.events.channel.enable" // true
GoRebalanceEnable string = "go.application.rebalance.enable" // true
)

edenhill · 2017-10-03T08:37:36Z

Do note that message.max.bytes sets the maximum individual size, but messages are typically produced and consumed in batches.
On the consumer side you set the maximum batch size to fetch by fetch.message.max.bytes which you have set to 1GB. Since there is some overhead (40 bytes as seen in your log) you will need to set receive.message.max.bytes somewhat higher than fetch.message.max.bytes. This is what the log message tells you to do:

%3|1506735207.752|FAIL|rdkafka#consumer-6| [thrd:ps6655.prn.parsec.abc.com:9092/2]: ps6655.prn.parsec.abc.com:9092/2: Receive failed: Invalid response size 1000000040 (0..1000000000): increase receive.message.max.bytes

Response sizes are sent in a signed 32 bit integer, when it becomes negative it means it overflowed (which I would say is a broker issue):

ERROR|rdkafka#consumer-6| Receive failed: Invalid response size -2147483555 (0..1000000000):

There is not really any reason to have such a high fetch.message.max.bytes (1GB), try setting it to at most 10-100MB or so.

sheeysong · 2017-10-17T04:58:54Z

Thank you for the prompt response!
It's quite strange indeed how we got into this weird error, after cleaned up logs triggered by load testing, restarted zookeeper/brokers, it went smooth. I will keep eyes open on this.
Thanks,
~Jing

sheeysong · 2018-02-13T00:41:04Z

Hi @edenhill
I am back to my own issue again. When we under the load testing, this issue surfaced up.
Per confluentinc/librdkafka#1616, it seems fixed the problem for the user, so we set fetch.message.max.bytes=20000000, receive.message.max.bytes=40000000, and message.max.bytes=10000000. However, the error came back with:
%3|1518481856.563|FAIL|rdkafka#consumer-7| [thrd:ps6576.prn.parsec.apple.com:9092/1]: ps6576.prn.parsec.apple.com:9092/1: Receive failed: Invalid response size 40003026 (0..40000000): increase receive.message.max.bytes

How it seems, whatever receive.message.max.bytes we set, it always has some overhead more than what we config. One of our Topic we use key to carry as the MessageHeader as your current version haven't support header yet.
Your thoughts?

My Java client didn't threw any exception when consumed the same cluster. (Maybe my messages have all TTL)

sheeysong · 2018-02-14T18:49:52Z

Hi @edenhill -- Any updates?
Thanks

edenhill · 2018-02-14T20:04:22Z

I've tried to reproduce this to no avail.
Do you know the typical message size for that topic?
Can you share the producer config?

sheeysong · 2018-02-15T19:38:46Z

This is our Producer config, we use default for all the rest of field if no override:
go.delivery.reports:true
log.connection.close:false
max.in.flight:20
linger.ms:0
heartbeat.interval.ms:300
compression.codec:none
request.timeout.ms:10000
reconnect.backoff.jitter.ms:500
socket.keepalive.enable:true
queue.buffering.max.kbytes:204800
client.id:FedProducer
retries:1
go.batch.producer:false
go.produce.channel.size:1000
acks:1
batch.num.messages:10000
socket.timeout.ms:60000

edenhill · 2018-02-20T08:47:44Z

Do you know the typical message size for that topic?

sheeysong · 2018-02-21T19:28:46Z

from 1KB up to 2-3MB (various Topics)

edenhill · 2018-02-21T19:44:34Z

Okay, so I think I know what the issue is.
fetch.message.max.bytes changed from meaning max number of bytes to fetch in total per request to max number of bytes to fetch per partition per request.
So if you are fetching 10 partitions, with a fetch.message.max.bytes of 1M, and your receive.message.max.bytes is 5M, it is possible that the returned response is 1M*10, thus > 5M.

The workaround for now is to set fetch.message.max.bytes to a reasonable value given the number of partitions you consume, and then making sure that receive.message.max.bytes is at least fetch.message.max.bytes*numPartitions + 5%

sheeysong · 2018-02-21T21:04:02Z

I see, thank you so much for the prompt reply, let me try it!

sheeysong · 2018-02-22T02:05:00Z

@edenhill -- we are testing the config changes per your suggestion.

Eventually, we will upgrade librdkafka-dev to librdkafka1, We did "go get -u github.com/confluentinc/confluent-kafka-go/kafka" half year back,
I am wondering: if half year ago KafkagoClient still backward compatible with librdkafka1?
To achieve consumption faster, we chose channel-based over function-based. So, if we choose function-based, will we encounter this problem? On the JavaClient side, we use consumer.poll() with max 100 messages per poll(), didn't have this issue.

Thanks!

sheeysong · 2018-02-23T00:00:13Z

Hi @edenhill ,
We had better run after reset receive.max.message.byte and fetch.message.max.bytes. Thanks for the help! After a day load on the cluster, the kafka broker host shows out bound to consumer side about 800MB on our 10Gb/s network. I have attached the network traffic and memory chat for you. Why the broker host have such high memory footprint? no free memory on the host
run: free -g
total used free shared buffers cached
Mem: 377 376 0 0 0 323
-/+ buffers/cache: 53 324

edenhill · 2018-03-21T06:16:22Z

Fixed on librdkafka master

mikesneider · 2020-12-30T18:44:39Z

Hi @edenhill ,
We had better run after reset receive.max.message.byte and fetch.message.max.bytes. Thanks for the help! After a day load on the cluster, the kafka broker host shows out bound to consumer side about 800MB on our 10Gb/s network. I have attached the network traffic and memory chat for you. Why the broker host have such high memory footprint? no free memory on the host
run: free -g
total used free shared buffers cached
Mem: 377 376 0 0 0 323
-/+ buffers/cache: 53 324

great charts, how did you create them?

edenhill added the question label Oct 3, 2017

sheeysong closed this as completed Oct 17, 2017

sheeysong mentioned this issue Feb 13, 2018

Receive failed: Invalid response size 100000039 (0..100000000): increase receive.message.max.bytes confluentinc/librdkafka#1616

Closed

7 tasks

edenhill mentioned this issue Mar 20, 2018

"Bad message format" error in consumers during rebalance confluentinc/librdkafka#1696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

sheeysong commented Oct 2, 2017

sheeysong commented Oct 2, 2017

edenhill commented Oct 3, 2017

sheeysong commented Oct 17, 2017

sheeysong commented Feb 13, 2018

sheeysong commented Feb 14, 2018

edenhill commented Feb 14, 2018

sheeysong commented Feb 15, 2018

edenhill commented Feb 20, 2018

sheeysong commented Feb 21, 2018

edenhill commented Feb 21, 2018

sheeysong commented Feb 21, 2018

sheeysong commented Feb 22, 2018

sheeysong commented Feb 23, 2018

edenhill commented Mar 21, 2018

mikesneider commented Dec 30, 2020

rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

rdkafka#consumer Invalid response size -2147483555 (0..1000000000) #100

Comments

sheeysong commented Oct 2, 2017

Description

How to reproduce

run consumer_channel_example.go with multiple groutines. Checklist

sheeysong commented Oct 2, 2017

edenhill commented Oct 3, 2017

sheeysong commented Oct 17, 2017

sheeysong commented Feb 13, 2018

sheeysong commented Feb 14, 2018

edenhill commented Feb 14, 2018

sheeysong commented Feb 15, 2018

edenhill commented Feb 20, 2018

sheeysong commented Feb 21, 2018

edenhill commented Feb 21, 2018

sheeysong commented Feb 21, 2018

sheeysong commented Feb 22, 2018

sheeysong commented Feb 23, 2018

edenhill commented Mar 21, 2018

mikesneider commented Dec 30, 2020

run consumer_channel_example.go with multiple groutines.
Checklist