-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SERVICE AVAILABILITY: vector validate hangs indefinitely when using kafka source on v0.34.0 and beyond - Denial of Service for systemd managed instances #20687
Comments
I tried setting:
Still hung. I added:
Still hung. I tried enabling the "Legacy OpenSSL" mode as well. Worked once, then hung indefinitely. The difference in the trace logs for the first run:
re-ran and it tried to rebalance again:
I don't think it's the TLS changes, I suspect, something is causing #17497 to run during a |
Just to confirm: it's the I agree this probably isn't related to TLS settings; however I am not able to observe this behavior with a simple |
Correct, it only happens on validate. Here's the config that's hanging most of the time. If I remove this from my [sources.kafka_ecs]
type = "kafka"
topics = ["ecs"]
group_id = "ecs-to-es"
bootstrap_servers = "host1.example.com:9093,host2.example.com:9093"
tls.enabled = true
[transforms.ecs_jsonify]
type = "remap"
inputs = [ "kafka_ecs" ]
source = """
. = parse_json!(.message)
"""
[sinks.elasticsearch_ecs]
type = "elasticsearch"
inputs = [ "ecs_jsonify" ]
endpoints = ["http://localhost:9200"]
bulk.index = "{{ @metadata.beat }}-{{ @metadata.version }}-{{ labels.location }}-%Y.%m.%d"
pipeline = "ecs-geoip-info"
healthcheck.enabled = false |
With my simple test setup of Typically the source created for the It seems to be easier to reproduce on 0.39.0 fwiw, maybe because of the bias change in that version, but I don't see it happening as reliably with 0.34.2, though it does still happen there.
|
I can confirm it tends to happen more reliably on 0.39.0 than 0.34.*. That mirrors my experience. It is SO comforting to know that this is not just a "me" thing. Thank you for looking into this. I guess, what I'm curious about, is I have: healthchecks.enabled = false So why are connections happening in the I guess, I'm expecting that to just be a syntax check, which is the purpose of the Is the "right" way to fix this, add |
Looks like the healthcheck option is specific to sinks, and unrelated to One solution for kafka sources would be to use a different/random consumer group name than the real config to avoid these spurious rebalances. For the vector pipelines I manage we do not run the validate command at all in our production environment, specifically to avoid the rebalancing issue. We run unit tests ahead of time in CI and it's just not necessary to validate at that point in our case. You can also use the |
With the fix in #20698 the consumer that's set up for |
A note for the community
Problem
vector validate hangs indefinitely when using kafka sources with tls.enabled. I verified this happens by bisecting versions from v0.32.1 (where it worked) through v0.39.0. The regression was introduced in v0.34.0.
I'm not really sure what's happening here, but this is the state:
vector validate
is finevector validate
hangs once v0.34.0 starts the first time, cannot restart.I ran through this a few times opening this ticket. However, I'm now at the point where, anything v0.34.0+ does not validate (it will start just fine, but the systemd service has the
vector validate
step in it which causes it to timeout and so I'm DEAD in the water).I had to forcefully downgrade vector to v0.33.1 everywhere because the
validate
action is broken.Configuration
Version
vector 0.32.1 (x86_64-unknown-linux-gnu 9965884 2023-08-21 14:52:38.330227446)
Debug Output
Example Data
Not needed
Additional Context
That
TRACE rdkafka::consumer: Running pre-rebalance with Assign
is the last thing I see. When Istrace
the only output is for a full 6 minutes is:References
No response
The text was updated successfully, but these errors were encountered: