-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement native OpenTelemetry infrastructure #1461
base: main
Are you sure you want to change the base?
Conversation
This commit adds the OTLP exports to Nativelink and extends the `nativelink` deployments in the operator with OpenTelemetryCollector sidecars. The exposed traces, metrics and logs are published through Kafka to NATS Jetstream.
cc @allada @SchahinRohani You might want to play around with this while it's in preview. @allada One thing that we'll need to figure out is where to put the @SchahinRohani You might want to look into OTLP, Kafka topics and NATS Jetstream. This initial implementation doesn't add structure, but at least it provides a central point to aggregate the logs, traces and metrics of nativelink deployments. I'll polish this a bit, but for now we have the following (assuming the pod for the nativelink-cas is e.g. kubectl port-forward nativelink-cas-ff6544bb8-v4w86)
(also I still have a small bug in the kustomization. You'll need to apply it twice. The one I use is this variant of the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and 0 of 23 files reviewed, and pending CI: Remote / large-ubuntu-22.04, and 1 discussions need to be resolved
nativelink-config/src/cas_server.rs
line 184 at r1 (raw file):
#[derive(Deserialize, Debug, Default)] #[serde(deny_unknown_fields)] pub struct OtlpConfig {
this should hold the hardcoded timeouts also, I'd take a bet that those will need to be adjusted based on needs
This commit adds the OTLP exports to Nativelink and extends the
nativelink
deployments in the operator with OpenTelemetryCollector sidecars. The exposed traces, metrics and logs are published through Kafka to NATS Jetstream.This change is