-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component telemetry inaccurate for some components in 0.32.0 #18265
Comments
Hi @jgournet , Could you share some graphs? Also your configuration? Did you change anything this morning? I see you are running 0.32.0 which would say to me that maybe you upgraded Vector. Is that the case? If so, is the behavior different with 0.31.0? Note that I would expect to see that: events increasing compared to events out if:
|
Hi @jszwedko , In the meantime, here's some info: prometheus query for this grap is:
A few notes:
For the "version" question: we actually use image timberio/vector:latest-alpine ... so we guess that agents that started up recently got auto-upgraded to 0.32, and started showing this issue. Also: in one of our test environment, we currently have 5 vector agents: 3 are running 0.32, and 2 are still on old 0.31. |
Thanks for the additional details @jgournet . That was enough for me to try to reproduce this. I think I was able to. I'm bisecting down to find the commit that introduced the bug now. I'll flag this to be fixed in 0.32.1. For now, I'd use the 0.31.0 docker image. |
Bisected down to 0bf6abd This is the configuration I was testing with: [sources.source0]
type = "demo_logs"
interval = 0
format = "json"
decoding.codec = "json"
[sinks.sink0]
type = "aws_s3"
inputs = ["source0"]
bucket = "timberio-jesse-test"
key_prefix = "18265/date=%F/"
encoding.codec = "json"
framing.method = "newline_delimited"
[sources.source1]
type = "internal_metrics"
[sinks.sink1]
type = "datadog_metrics"
inputs = ["source1"]
default_api_key = "${DD_API_KEY}" It appears that |
Thank you @jszwedko ! Quite impressive how you managed to track this down with so few information ! |
@jszwedko : |
Definitely not intended 🙂 Can you share the configuration you are using? I tested again just now and the |
sorry ... ignore that: seems like we had some old nodes that did not pull "latest" properly ... will try again with "Always" as PullPolicy, but it seems ok after all |
Thanks for confirming and for the initial report! |
A note for the community
Problem
Sorry, this is a very light ticket - just describing the issue we're facing:
Since this morning (16/08), we're having alerts because the events in vs events out metric is increasing.
However, logs are still being pushed to our S3 sink ... it seems the metric is not functioning well ?
Is someone else affected and could help provide more info ?
Configuration
No response
Version
vector 0.32.0 (x86_64-unknown-linux-musl 1b403e1 2023-08-15 14:56:36.089460954)
Debug Output
No response
Example Data
No response
Additional Context
Reverting to 0.31.0-alpine seems help with the metric
References
No response
The text was updated successfully, but these errors were encountered: