Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka sink is silently failing #1842

Closed
binarylogic opened this issue Feb 18, 2020 · 9 comments
Closed

kafka sink is silently failing #1842

binarylogic opened this issue Feb 18, 2020 · 9 comments
Labels
domain: observability Anything related to monitoring/observing Vector sink: kafka Anything `kafka` sink related type: bug A code related bug. type: task Generic non-code related tasks

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Feb 18, 2020

This was posted in our chat:

hello, guys! I use vector as tcp > kafka transport.
some short time after the start, sink stucks, nothing happens, no suspicious messages in broker > and vector logs (in debug mode too)
for a litte time before stuck, in vector log disappears sink messages (only source messages remains)
Has anyone encountered this problem?

This lacks a lot of detail (I'll update if we get more), but it's worth doing an audit of the kafka sink to see if we're missing opportunities to log.

Related #1818

@binarylogic binarylogic added sink: kafka Anything `kafka` sink related domain: observability Anything related to monitoring/observing Vector type: task Generic non-code related tasks labels Feb 18, 2020
@binarylogic binarylogic mentioned this issue Feb 18, 2020
59 tasks
@ghost ghost self-assigned this Feb 19, 2020
@ghost ghost added the type: bug A code related bug. label Feb 19, 2020
@ghost ghost removed their assignment Feb 19, 2020
@ghost
Copy link

ghost commented Feb 19, 2020

In case if rdkafka buffer is full, Vector doesn't send the data but prints a debug message instead of an error:

https://github.com/timberio/vector/blob/841a8f8666ead89443dd7c98e8aa4b241537bac6/src/sinks/kafka.rs#L172-L184

So it might be the case, although not sure how much more verbose would the output become if we changed debug! to error error!. Other errors seem to be logged as errors using the error! macro.

@binarylogic
Copy link
Contributor Author

Yeah, we should definitely be logging this at the error level. If this has a chance of being rapid we should consider rate-limiting it with the rate_limit attribute.

@linshaoyong
Copy link

linshaoyong commented Feb 20, 2020

I have encountered similar problems, I use file source

[sources.log_source]
  type = "file"
  include = ["/var/log/a.log"]
  ignore_older = 86400 # seconds
  start_at_beginning = false
  max_line_bytes = 204800 # default value 102400

If I set max_line_bytes to the default, everything is ok, If I change max_line_bytes to 204800, kafka sink stucks.

Feb 20 19:57:36.989  INFO vector: Log level "debug" is enabled.
Feb 20 19:57:36.989  INFO vector: Loading configs. path=["vector.toml"]
Feb 20 19:57:36.991  INFO vector: Vector is starting. version="0.8.0" git_version="v0.7.0-168-g841a8f8" released="Wed, 19 Feb 2020 11:14:46 +0000" arch="x86_64"
Feb 20 19:57:36.994  INFO vector::topology: Running healthchecks.
Feb 20 19:57:36.995  INFO vector::topology: Starting source "log_source"
Feb 20 19:57:36.995  INFO vector::topology: Starting sink "log_sink"
Feb 20 19:57:36.995  INFO source{name=log_source type=file}: vector::sources::file: Starting file server. include=["/var/log/a.log"] exclude=[]
Feb 20 19:57:36.996  INFO source{name=log_source type=file}:file_server: file_source::file_server: Found file to watch. path="/var/log/a.log" file_position=0 rate_limit_secs=30
Feb 20 19:57:37.006 DEBUG sink{name=log_sink type=kafka}: vector::sinks::kafka: sending event. count=1
Feb 20 19:57:37.006 DEBUG sink{name=log_sink type=kafka}: vector::sinks::kafka: rdkafka queue full: Message production error: MessageSizeTooLarge (Broker: Message size too large)
Feb 20 19:57:37.006 DEBUG sink{name=log_sink type=kafka}: vector::sinks::kafka: rdkafka queue still full: Message production error: MessageSizeTooLarge (Broker: Message size too large)
Feb 20 19:57:39.996 ERROR vector::topology::builder: Healthcheck: Failed Reason: Meta data fetch error: BrokerTransportFailure (Local: Broker transport failure)

/var/log/a.log file size is about 3M, 50~200 bytes each line
I'm sure this message(Broker: Message size too large) is triggered by rdkafka, not Kafka's Broker.

I suspect it is related to the two configurations of librdkafka

message.max.bytes = 1000000
batch.num.messages = 10000

when max_line_bytes * batch.num.messages > message.max.bytes, kafka sink stucks.

@ghost
Copy link

ghost commented Feb 20, 2020

@linshaoyong

I suspect it is related to the two configurations of librdkafka

message.max.bytes = 1000000
batch.num.messages = 10000

when max_line_bytes * batch.num.messages > message.max.bytes, kafka sink stucks.

Have you tried to increase message.max.bytes or reducing batch.num.messages using librdkafka_options configuration variable? Like this:

[sinks.my_kafka_sink]
  type = "kafka"
  inputs = ["my-source-id"]
  bootstrap_servers = "10.14.22.123:9092,10.14.23.332:9092"
  key_field = "user_id"
  topic = "topic-1234"
  encoding = "json"
  [sinks.my_kafka_sink.librdkafka_options]
    "batch.num.messages" = "1000000000"
    "batch.num.messages" = "100"

@linshaoyong
Copy link

linshaoyong commented Feb 20, 2020

@a-rodin reducing batch.num.messages has no effect, even if it is reduced to 1. Increasing message.max.bytes is effective.

@ghost ghost self-assigned this Feb 24, 2020
@ghost
Copy link

ghost commented Feb 25, 2020

@linshaoyong Could you please try to run Vector with environment variable LOG set to vector=info,rdkafka=trace? It should produce detailed error messages from librdkafka, which I hope would make it possible to identify the cause of the issue.

@linshaoyong
Copy link

linshaoyong commented Feb 26, 2020

@a-rodin Due to performance issues, it cannot run continuously. Does the log at the beginning of the program help?About 70 messages in a minute.

  [sinks.system_log_sink.librdkafka_options]
    "reconnect.backoff.ms" = "2000"
    "message.max.bytes" = "102400000"
    "queue.buffering.max.ms" = "3000"

vector.log

@ghost
Copy link

ghost commented Feb 27, 2020

@linshaoyong The log from the file seems to contain no errors, all messages are delivered. Actually, in the recently released Vector 0.8.0 the default logging level for rdkafka is set to error, now even with default settings all errors happening inside librdkafka should be visible.

@binarylogic binarylogic added this to the Improve observability milestone Mar 14, 2020
@binarylogic binarylogic unassigned ghost Apr 15, 2020
@binarylogic binarylogic removed this from the Vector Observability milestone Jul 20, 2020
@binarylogic
Copy link
Contributor Author

Closing since we haven't made progress on this issue. I hope that #1903 provides info to resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: observability Anything related to monitoring/observing Vector sink: kafka Anything `kafka` sink related type: bug A code related bug. type: task Generic non-code related tasks
Projects
None yet
Development

No branches or pull requests

2 participants