Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eBPF instrumentation manager #1776

Merged
merged 40 commits into from
Dec 4, 2024
Merged

Conversation

RonFed
Copy link
Collaborator

@RonFed RonFed commented Nov 18, 2024

This PR is a follow-up to #1645.
Adding the new Manager which will eventually replace the Director.
The new design has the following key features and improvements:

  1. Use the new runtime-detector module to trigger instrument/un-instrument events. This replaces the current approach which relies on a Pod reconciler. The pod reconciler approach main disadvantage is in scenarios of multiple containers in the same Pod and multiple processes in the same container. Changing the trigger to being process creation will allow us to guarantee we won't miss a requested instrumentation.
    note: the runtime-detector is configured to filter process events and will only pass events according to its configuration.
  2. Event loop design. The current director has a lot of fixed and potential race conditions due to the concurrent nature of processes creating/exiting and Pod events from the reconciler. The new Manager does not have locks and uses an internal event loop.
  3. Configuration updates are triggered by the InstrumentationConfig reconciler (same as before) - those updates will be handled in the event loop.
  4. The Factory interface is refactored and a Settings option can be expanded in the future to add more initial configuration options.
  5. The Instrumentation interface is introduced and will replace OtelEbpfSdk.
  6. Update go.opentelemetry.io/auto to v0.18.0-alpha.

This change will currently only apply for OSS Go instrumentation.

@RonFed RonFed requested review from tamirdavid1, damemi, edeNFed, blumamir and david336362 and removed request for tamirdavid1 November 18, 2024 08:51
@RonFed RonFed marked this pull request as ready for review November 18, 2024 09:17
@RonFed RonFed changed the title [WIP] eBPF instrumentation manager eBPF instrumentation manager Nov 18, 2024
odiglet/go.mod Outdated Show resolved Hide resolved
Copy link
Collaborator

@blumamir blumamir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graet job!!!

Looks good, added many comments but most are nits and style

odiglet/cmd/main.go Outdated Show resolved Hide resolved
odiglet/cmd/main.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
odiglet/pkg/kube/instrumentation_ebpf/pods.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
odiglet/pkg/ebpf/manager.go Outdated Show resolved Hide resolved
m.stop = stop

// main event loop for handling instrumentations
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how or if we should handle it, but I think this design has a noisy neighbor issue:
Imaging a Pod with the following bash script:

while true; do
    echo "test"
done

This will trigger tons of ProcEvents and will cause starvation for all other pods / configuration changes.
What do you think? cc @blumamir

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aka Odiglet DDos


m.Logger.Info("cleaning instrumentation resources", "pid", pid)

err := details.Inst.Close(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nil check, maybe we should move the Close function to the details instance instead of closing the individual fields?
This will probably never be nil but still afraid of panics

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the startTrackInstrumentation which is the only place where instrumentationDetails is created, since this is an internal logic to this package, and not dependant on user input I think if the inst is nill - it is a result of some logical error on our side which should never happen.

Copy link
Contributor

@edeNFed edeNFed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// active instrumentations by workload, and aggregated by pid
// this map is not concurrent safe, so it should be accessed only from the main event loop
detailsByWorkload map[types.NamespacedName]map[int]*instrumentationDetails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add workload kind and language here and simplify the usage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL

odiglet/pkg/ebpf/manager.go Show resolved Hide resolved
// In case of a failure, an error will be returned and all the resources will be cleaned up.
Load(ctx context.Context) error

// Run will attach the probes to the relevant process, and will start the instrumentation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Run will attach the probes to the relevant process, and will start the instrumentation.
// Run will start the instrumentation.

@RonFed RonFed merged commit 5d612b2 into odigos-io:main Dec 4, 2024
27 of 28 checks passed
@RonFed RonFed deleted the instrumentation_manager branch December 4, 2024 10:46
RonFed added a commit that referenced this pull request Dec 15, 2024
A follow up to #1776:
* Make the instrumentation manager generic, and move it to a new module.
* Add a k8s-oriented implementation of the manager's different
interfaces under `odiglet/pkg/ebpf`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants