Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Namespace Source instrumentation #2105

Merged
merged 22 commits into from
Jan 8, 2025

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Dec 30, 2024

This adds instrumentation+uninstrumentation for Namespace-wide resources with Source objects. A namespace Source looks like so:

apiVersion: odigos.io/v1alpha1
kind: Source
metadata:
  name: default
  namespace: default
  labels:
    odigos.io/e2e: source
spec:
  workload:
    name: default
    namespace: default
    kind: Namespace

With name and namespace matching the name of the Namespace, and kind set to Namespace. These 3 fields are required because we're have the entire PodWorkload field marked required, so this can probably be simplified.

What this does:

  • When calling GetSourceListForWorkload, check for inherited instrumentation from Namespace.
  • In startlangdetection, if the Source is for a Namespace, loop through all potential workloads in that Namespace (similar to how we do label based namespace instrumentation today)
    • Exclude any workloads that are explicitly excluded (by setting odigos.io/excluded on a Source for that workload -- TODO add a test for this)
  • In deleteinstrumentedapplication, do something similar to what we do today by looping through the workloads again, re-using most of the check functions.
    • TODO some of these functions still rely on the label, like isInheritingInstrumentationFromNs in syncGenericWorkloadListToNs
  • Add e2e for Namespace instrumentation and test steps for uninstrumentation

@damemi
Copy link
Contributor Author

damemi commented Jan 2, 2025

@tamirdavid1 looks like the tests are failing the 2nd time I instrument the apps (by namespace) with this diff:

    | 18:42:17 | source | Assert Runtime Detected                          | ASSERT    | ERROR | odigos.io/v1alpha1/InstrumentationConfig @ default/deployment-inventory
        === ERROR
        ---------------------------------------------------------------------
        odigos.io/v1alpha1/InstrumentationConfig/default/deployment-inventory
        ---------------------------------------------------------------------
        * status.runtimeDetailsByContainer[0].envVars[0].value: Invalid value: "/bar:/var/odigos/python:/var/odigos/python/opentelemetry/instrumentation/auto_instrumentation": Expected value: "/bar"
        
        --- expected
        +++ actual
        @@ -9,12 +9,14 @@
             controller: true
             kind: Deployment
             name: inventory
        +    uid: df8406f5-5dd0-4d7d-bf09-44516da27ed3
         status:
           runtimeDetailsByContainer:
           - containerName: inventory
             envVars:
             - name: PYTHONPATH
        -      value: /bar
        +      value: /bar:/var/odigos/python:/var/odigos/python/opentelemetry/instrumentation/auto_instrumentation
             language: python
        +    runtimeUpdateState: Skipped
             runtimeVersion: 3.11.9

I'm just curious if you think this could be related to #2082? If so is there anything I should update to react to that?

For reference, this test is trying to:

  • Instrument individual deployments
    • Assert the instrumentationconfigs/instrumentedapplications match this file
  • Un-instrument the deployments
    • Verify that the instrumentationconfigs and instrumentedapplications are deleted
  • Instrument the namespace
    • Assert the same IC/IA check from earlier <--- fails here with the above diff

@damemi damemi force-pushed the source-namespace-inst branch from 5d3f265 to 40c0205 Compare January 2, 2025 13:29
@damemi
Copy link
Contributor Author

damemi commented Jan 2, 2025

@tamirdavid1 doing more trials it looks like this doesn't happen the first time the workload is instrumented, only on subsequent attempts

@damemi damemi force-pushed the source-namespace-inst branch from 4880f52 to aae3f77 Compare January 2, 2025 19:08
@damemi damemi force-pushed the source-namespace-inst branch from d9acb7d to c228058 Compare January 2, 2025 21:52
api/odigos/v1alpha1/source_types.go Outdated Show resolved Hide resolved
common/consts/consts.go Outdated Show resolved Hide resolved
api/odigos/v1alpha1/source_types.go Outdated Show resolved Hide resolved
if err != nil {
return ctrl.Result{}, err
}
if len(sourceList.Items) == 0 {
return ctrl.Result{}, nil
}
// if this is explicitly excluded (ie, namespace instrumentation), skip
for _, s := range sourceList.Items {
if _, exists := s.Labels[consts.OdigosWorkloadExcludedLabel]; exists {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s in this case can also be a source of kind Namespace, this gives a specific semantic to this check (excluded namespaces will also trigger the workload to be considered as not instrumented. Not sure I covered all cases in my head, but if the Namespace is excluded, and the workload source is not, I think it might not get instrumented when it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that we could have an excluded namespace with an included workload inside of it. I thought it was only the other way around (ie, you can instrument a namespace and specify certain workloads to exclude). If both are the case I'll update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not officially something that we suggest people to do, but no one will stop users from populating the label on a namespace. In this case it might be worth making sure we still instrument correctly :)

@damemi
Copy link
Contributor Author

damemi commented Jan 5, 2025

note: blocked on #2107 for tests to pass

@damemi
Copy link
Contributor Author

damemi commented Jan 7, 2025

@blumamir @RonFed I think I got to all the feedback, ptal.

One thing I realized this was missing was the case where someone creates or deletes and excluded Source in a namespace that's already instrumented. That's added along with tests in 7295ac9.

There are basically 4 cases:

  • If a normal Source is created, the startlangdetection should instrument
  • If a normal Source is deleted, the deleteinstrumentationconfig should uninstrument
  • If an excluded Source is created, the deleteinstrumentationconfig should check if it needs to be uninstrumented (ie, if it's in an instrumented namespace)
  • If an excluded Source is deleted, the startlangdetection should check if it needs to be instrumented (ie, if it's in an instrumented namespace)

This requires 2 finalizers. I'm planning to move the finalizer creation to the same mutating webhook we'll use for label defaulting. But for now, this will work.

If it's helpful, I had to draw it out to understand:

finalizer relation

@damemi
Copy link
Contributor Author

damemi commented Jan 7, 2025

Also, I'm pulling the commits from #2122 into this branch to get the tests to pass. Will resolve with #2107 when we merge into main

@BenElferink BenElferink added the enhancement New feature or request label Jan 7, 2025
@BenElferink BenElferink requested review from blumamir and RonFed January 7, 2025 18:30
api/odigos/v1alpha1/source_types.go Outdated Show resolved Hide resolved
api/odigos/v1alpha1/source_types.go Show resolved Hide resolved
Comment on lines +178 to +180
if sourceList.Namespace != nil && sourceList.Namespace.DeletionTimestamp.IsZero() {
return true, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the workload has a source CR, then it's not inheriting it's instrumentation from the ns, but this function can return true here instead of false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like we talked about offline, this function and its use will be refactored in the backend label clean-up. Keeping this open as a reminder to do that

@@ -38,34 +41,138 @@ type SourceReconciler struct {

func (r *SourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
logger.Info("Reconciling Deleted Source object", "name", req.Name, "namespace", req.Namespace)
logger.Info("Reconciling Source object", "name", req.Name, "namespace", req.Namespace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also write the kind here? so it's not ambiguous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line just refers to the Source object itself, so the kind is always Source. A Source can be named differently from the workload, for example:

apiVersion: odigos.io/v1alpha1
kind: Source
metadata:
  name: my-source # <--- source name
  namespace: default
spec:
  workload:
    name: coupone # <--- workload name
    namespace: default
    kind: Deployment

@@ -151,11 +151,15 @@ func syncGenericWorkloadListToNs(ctx context.Context, c client.Client, kind work
}
}

if !isInheritingInstrumentationFromNs(freshWorkloadCopy) {
var err error
inheriting, err := isInheritingInstrumentationFromNs(ctx, c, freshWorkloadCopy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check only says if a workload instrumentation is inherited from ns. not if it's effectively instrumented or not.

It used to be ok because of this which guaranteed that this function is called after we already know the namespace is not labeled.

I think that the changes in this PR renders this assumption invalid, and we can now deleteWorkloadInstrumentationConfig and removeReportedNameAnnotation below even for instrumented workloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #2105 (comment)

@damemi damemi merged commit 4969d67 into odigos-io:feature/source-crd Jan 8, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants