-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Otel receivers spawn on need #687
Conversation
Spawning a receiver for every signal for every receiver component was not needed and created errors with the kafka receivers. Now the otelcol receiver checks which signals are needed and spawn the corresponding receivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this one, especially because checkOutputSignals
isn't handling updates. I find this to be a little bit complicated and potentially fixing a symptom of a larger underlying issue.
If this is a bug with upstream, we should help them fix that bug upstream. If this isn't a bug with upstream, I'd like to understand what upstream does differently than we do.
The Otel collector has pipelines. This looks like this for kafka:
Because the kafka receiver is set in the traces pipeline, the otel collector will only create a TraceReceiver to claim the provided kafka topic. In Alloy we currently don't have a way to specify which signal the receiver should handle so we spawn by default the three receivers (metrics/logs/traces) for every receiver component.
At best, the useless receivers that we spawn don't have any impact (fe the metricsReceiver and the logsReceiver that we spawn for the jaeger receiver don't do anything). But for Kafka all active receivers try to claim the given topic. The goal of this PR is to use the output block as an equivalent to the pipeline feature of Otel and only spawns the receivers that are needed. In this case, because it only outputs traces the receiver should only create a TraceReceiver. |
it's a bit confusing to call everything receiver but that's how it's called in the code. An otel receiver contains up to three "signals" receivers: metrics, logs and traces |
Good point. If we go with this solution, I could try to move it or make it an error in the validate func if we want to prevent this behavior for the kafka receiver. Or we just keep the warning in the doc and if the user configure it wrong it will get a decoding error msg |
But what happens upstream if you configure the kafka receiver in the logs and traces pipeline? Does it reject it? |
It depends on whether you explicitly configured a Kafka topic. If you didn't, then it'll default to a different topic for every signal. If you did, it'll try to use that same topic for all signal types. (I have never tried this with this receiver btw :) I'm just stating my expectations) |
The behavior is the same as with Alloy: it starts the logsReceiver and the tracesReceiver and one of them will claim the topic. If it's the traceReceiver then it works. If it's the logsReceiver then it will fail to decode (it's stuck in a loop -> consumes a message -> fails to decode because the schema is wrong -> try to consume the next message etc...) |
Sorry I did not see your answer @ptodev, but you are correct indeed |
So I think my statement stands here; I would prefer reporting and help fix that behavior in upstream, since I think this is a symptom of a larger underlying issue rather than an issue with the framework we built out for creating OTel component wrappers. |
It's two different things. For Alloy this is a bug because we cannot specify the pipeline. ATM the only way to make the kafka receiver work consistenly is to explicitly set the topic argument to an empty string. If you set the topic argument to any other value, it will not work. And this is a workaround that expects that you named the topics to the correct default values. In other words, our otelcol.receiver.kafka is completely broken and only usable via a hacky workaround |
I don't understand, how can a user put the Kafka receiver in a "wrong" pipeline in OpenTelemetry? Given that OpenTelemetry intends for a receiver to be re-used in any number of pipelines of any mix of telemetry types, I don't see how an instance of the Kafka receiver component could be placed in a "wrong" pipeline. For example, based on what I understand about OpenTelemetry, this should be valid: receivers:
kafka: # ...
exporters:
otlp: # ...
pipelines:
metrics:
receivers: [kafka]
exporters: [otlp]
logs:
receivers: [kafka]
exporters: [otlp]
traces:
receivers: [kafka]
exporters: [otlp] I don't see anything in upstream documentation for the Kafka receiver would imply that this is invalid just for the Kafka receiver. Am I misunderstanding what you meant here? |
@rfratto The config you suggested is going to work fine, but only if you don't explicitly set a kafka topic in the OTel config. If you do set a topic explicitly, then that topic will be used in every pipeline. This causes the kafka receiver to throw errors because all of logs/metrics/traces receivers will be trying to interpret the incoming data as logs/metrics/traces respectively. I'd have preferred if we can explicitly set a "log topic", "traces topic" and a "metrics topic", similarly to how the otlp receiver can have |
The config is valid as long as you don't specify a topic. In this case you will get this:
If you specify one topic (let's say "test_topic") in your kafka receiver config, they will all try to claim it:
Only one will get it. This is not good and will work if you are lucky, else you get decoding error and no data is forwarded. One you can do if you need to specify different names is this:
With this I can configure different kafka receivers with different topics and assign them to the correct pipelines. By correct pipeline I mean that the topic "tracing_custom_topic" which contains traces should only be consume by the traces pipeline. This use-case is currently not possible in Alloy. |
Thanks, I see. Yes, in that case there would be a collision. However, I'd rather not us stray too far from the upstream behavior as we aim for config compatibility, so I don't think we should reject someone configuring the component this way. However, I do think it's ok for us to mark the component unhealthy if we detect this scenario (outputting to multiple telemetry types with the component using the topic per type). I've also noticed that we don't set the default topic the same way as upstream. Upstream sets the default topic at runtime based on the telemetry type the receiver is being instantiated for. We always set it to Additionally, I understand now why we're changing all receivers to conditionally enable pipelines depending on where data is being routed. I think this needs a few tests though:
I also still think we should report this behavior upstream and ensure it's intended on their side. I also think we should help them modify their documentation to warn of this behaviour if it is intended. Otherwise there may be a way to fix it (such as finding a way to allow multiple subscribers to the same topic as long as they're all subscribing for different telemetry types). |
Although this is a breaking change in theory. In practice it cannot be a breaking change because if you use the default value of the topic arg it won't work. The only way to have our otelcol.receiver.kafka component working is to set it to an empty string. We only found out now because no one was using this component I guess |
I can follow up on this :) |
I will add the tests |
Yeah, that's a good point. Though my understanding is that it's not that it never worked at all, just that it's a race condition so there may be some missing data, right?
Neither our usage stats for Flow nor Alloy show any usage of this component. That doesn't mean that there isn't someone using it that's just not reporting usage stats, but it does seem reasonable to say that this component has little-to-no usage. I definitely think it's OK for us to fix the default value now without it breaking people 👍 |
If I remember correctly, the Kafka receiver upstream used to also set the topic to "otlp_spans" by default, for metrics/logs/traces pipelines. They later changed it to have a different topic depending on the type of telemetry. The Alloy defaults are consistent with the old Otel Collector defaults. We could change our default to |
Yeah saying that it "never worked" was a bit of a shortcut, you do have a chance to hit the right receiver although you need to roll the dice every time the sessions end and the receivers need to re-claim the topic |
I'm ok with the plan, should we explicitly say that if you leave it to "" then you can use the 3 signals for one receiver but if you set a topic then you should only use one signal? Just as documentation, no log in the code. And then hopefully we can change the behaviour in Otel |
Yeah, I think if the default is an empty string then it should use different topics internally and so it should be valid to use all three types for emitting telemetry. If you manually set a topic, then it's an invalid config and we should note that on the component health by marking it as unhealthy with a descriptive message. (We can mention this in the documentation as well). Marking it in a log entry could work, but I think it's less persistent than the health status would be. And then yeah hopefully we can work with upstream to see if there's any possible improvements 👍 |
Actually, I don't think we can change the default to |
Have you seen the discussion about this above? We know that the component can't work properly today regardless of how you configure it, and that upstream have recently changed the defaults, so fixing this could fall under any one of these exceptions:
We've also observed that the component has little-to-no usage (likely due to the bug), which makes it easier for us to justify an exception to the rule to fix the behavior as it's unlikely anyone would actually be broken. |
Yes, as you say I think we'd be ok with such a change in this case 👍 |
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks William!
* Spawn otelcol receivers on need. Spawning a receiver for every signal for every receiver component was not needed and created errors with the kafka receivers. Now the otelcol receiver checks which signals are needed and spawn the corresponding receivers. * changelog entry * add warning when more than one signal is set for the output of the kafka receiver * cleanup * use validate instead of logging * update doc * add test to checkt that the receiver spawns the correct receivers on update * doc typo * remove default value in code * update tests * Update docs/sources/reference/components/otelcol.receiver.kafka.md Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * Update internal/component/otelcol/receiver/kafka/kafka.go Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * Update docs/sources/reference/components/otelcol.receiver.kafka.md Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * adapt tests * make test more readable --------- Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
* Spawn otelcol receivers on need. Spawning a receiver for every signal for every receiver component was not needed and created errors with the kafka receivers. Now the otelcol receiver checks which signals are needed and spawn the corresponding receivers. * changelog entry * add warning when more than one signal is set for the output of the kafka receiver * cleanup * use validate instead of logging * update doc * add test to checkt that the receiver spawns the correct receivers on update * doc typo * remove default value in code * update tests * Update docs/sources/reference/components/otelcol.receiver.kafka.md Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * Update internal/component/otelcol/receiver/kafka/kafka.go Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * Update docs/sources/reference/components/otelcol.receiver.kafka.md Co-authored-by: Paulin Todev <paulin.todev@gmail.com> * adapt tests * make test more readable --------- Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
PR Description
Modify all otelcol receiver components to only start a receiver for a signal when the signal is needed downstream (when it's set in the output block).
This fixes an issue with the
otelcol.receiver.kafka
component when the wrong receiver would claim the topic, leading to decoding error.Which issue(s) this PR fixes
Fixes #644
Notes to the Reviewer
PR Checklist