-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Donation Proposal]: Beyla, eBPF auto-instrumentation tool for metrics and traces #2406
Comments
I’m looking forward to Beyla's potential donation to the OpenTelemetry project, as it helps cover important gaps in auto-instrumentation for unsupported languages and environments. That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects. As part of the donation, it’s crucial to ensure the current core OpenTelemetry repositories remain the main source of truth, and that we avoid duplicating code or functionality. It would be helpful to see how Beyla and existing projects can come together without redundancy. I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there. |
Thanks for the comments Eden. The main overlap in functionality is related to Go Auto Instrumentation, for which we propose to merge our functionality there and vendor it in the new project. The main challenge I see is the multi-process support, which we need for fleet wide monitoring, however I'm sure we can overcome these challenges. For eBPF Networking, I think we can use this as an opportunity to bring the functionality at the same level as Go Auto, using similar development stack and libbpf CO-RE based approach. I don't think the donation overlaps in any way with the OpenTelemetry Operator or the OpenTelemetry eBPF Profiler. I think providing a generic way to extract trace/span information for the eBPF Profiler will be great to be able to correlate traces with profilers. |
I'm not sure there's much duplication there, except with the eBPF networking component, which we addressed in relationships to existing OpenTelemetry Projects. There's a recent request to add Beyla as a component in the OpenTelemetry Collector, which this would help a lot. open-telemetry/opentelemetry-collector-contrib#34321 |
Thanks for the detailed proposal @grcevski! I think this is great for building progress on OpenTelemetry/eBPF and covering existing gaps. To mirror what @edeNFed said, avoiding confusion and duplication is important. But I think you have explained that the idea is to vendor the existing Go Auto-Instrumentation as a dependency into the Beyla donation. That makes sense to me, as it fits with the goals we've been working on together in Go Auto (ie, to make that repo a library/API/SDK that can be imported by other implementations). To that, it makes sense that OpenTelemetry would provide both (a) an open-source library/framework for eBPF instrumentation with a "raw" agent as the default artifact and (b) an open-source component consuming that framework to provide second-level functionality and usability. @jsuereth and I were actually talking about this, and he compared this situation to roughly to how the collector works. I think the potential overlap with the OpenTelemetry Operator is in the fact that the Operator does deploy that default agent from Go Auto-Instrumentation, but that's about it. To draw back to the collector comparison, I would say that the Operator is to the Collector as Beyla is to Collector-Contrib: built on a stable, minimal core with added functionality. Both exist to give users options based on their needs. All that said, we should make sure to apply the same standards for donation that we are also applying to the Compile-time Go Instrumentation donation. Specifically:
All in all, I wouldn't be surprised to see these 3 projects collaborate and converge more often as time goes on. Thanks for your work on this @grcevski! |
I am by no means an expert on ebpf but one thing I'd like to ask: would it be possible to work towards one ebpf solution that combines what beyla does (auto instrumentation with traces, metrics I suppose + networking) + the profiler? Because at the end what people want (see this discussion for example: open-telemetry/opentelemetry-specification#4255) is a combination of all four signals, but if those 2 projects are separate we either need a way to install them side-by-side or people have to choose. |
I think that one ebpf solution would be something like Beyla. But, I don't think that idea means all of the code for every signal+language lives in one monorepo with the higher-level component. That's what I mean by separate repos at least |
I agree with @edeNFed and @damemi comments. Having projects handling auto-instrumentation and on top of them higher level implementations (like the Operator or Beyla) which uses multiple other projects is a good structure in my opinion. As a maintainer in the go-auto-instrumentation project, I'd be happy to accept donations from Beyla to the current project. |
I'm excited to see this donation proposal! I have made a few contributions to Beyla in the past, and have found the maintainers knowledgeable, kind, and helpful. I also think Beyla fills an important gap by providing language-agnostic telemetry. There are definitely details to work out, but i'm very supportive of this proposal. |
This looks great, and thanks @grcevski for calling out how this relates to and can merge or interoperate with Go auto-instrumentation, network monitoring (@yonch FYI), and the profiling agent (FYI @christos68k, @petethepig, @felixge, @fabled)! These were going to be the first questions that I asked, and it looks like we already have good notions about how things can proceed with each. Now that we have several projects in flight that use eBPF, it seems sensible to have them inherit from a common base, if possible. @alolita and I will be on point for this process for the Governance Committee. We'll circle back in a few days with next steps once more community members have time to comment. |
Don't miss out on insights from OpenTelemetry Network traces! There’s been always demand for deeper eBPF integration within the OpenTelemetry Collector 🎉 |
Hi @grcevski! Haven't seen you at the eBPF SIG meeting.. Let's meet at Kubecon to discuss more about the network aspects? I'm |
Yes! Apologies - I was slammed this week! We'll start the process after Kubecon. Will you / any other members of the Beyla team be there? If so, I like @yonch's suggestion: we can meet and work out more parts of the story with the other OTel eBPF projects and Go instrumentation projects. |
Yes, I'll be there. I connected with @yonch and we arranged to meet there, so this sounds like a great idea, let's all meet there and we can discuss all parts of the donation. Thanks and see you next week! |
Awesome! I think that most maintainers will be hanging out at the OpenTelemetry Observatory for most of the conference, let’s meet up there!
…________________________________
From: Nikola Grcevski ***@***.***>
Sent: Sunday, November 10, 2024 6:35:02 AM
To: open-telemetry/community ***@***.***>
Cc: Morgan McLean ***@***.***>; Mention ***@***.***>
Subject: Re: [open-telemetry/community] [Donation Proposal]: Beyla, eBPF auto-instrumentation tool for metrics and traces (Issue #2406)
Will you / any other members of the Beyla team be there? If so, I like @yonch<https://github.com/yonch>'s suggestion: we can meet and work out more parts of the story with the other OTel eBPF projects and Go instrumentation projects.
Yes, I'll be there. I connected with @yonch<https://github.com/yonch> and we arranged to meet there, so this sounds like a great idea, let's all meet there and we can discuss all parts of the donation. Thanks and see you next week!
—
Reply to this email directly, view it on GitHub<#2406 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAIXLK7PJ625UV6EKR56NW3Z75VJNAVCNFSM6AAAAABQOB26I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWG42TSNRSG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@grcevski and I met yesterday and talked about the networking aspects. We agreed to work next week on a short statement on the tradeoffs each project offers to make it clearer for users wanting to decide which is a better fit. We also discussed porting capabilities which seems like a good avenue to pursue after donation. |
Catching up after conversations that I had with @grcevski and @MrAlias at Kubecon. We want to end up with one way for OTel end users to instrument their Go, C++, Rust, etc. applications with an agent. Put another way, we don't want them to have to choose between two different OTel eBPF options for the same use case. From our conversations, I think that we're all aligned on this. @MrAlias raised the possibility of having Beyla (the name will change as part of the donation) depend on OTel Go instrumentation, and that end users would use Beyla. Does this make sense to everyone else? There are likely other options as well. |
@mtwo speaking from Odigos's perspective, that's what we'd like to see too. Keeping the current OTel Go repo as a library/dependency allows vendors like us to implement custom instrumentation controllers for our users, while offering a stock OSS component built on that library provides a useful story for the default open source end user. I don't think that's uncommon either, as we see it in other areas of OTel too (like custom SDKs and Collector builds). These underscore the fact that OpenTelemetry is a standard, not just a set of off-the-shelf tools. And enabling custom implementations of the standard promotes the overall health of the project. Like I mentioned to @grcevski and @MrAlias last week, I'm very interested in contributing to this as well, as I think it is a big benefit to the OTel ecosystem, which benefits us as well. |
Awesome, by the thumbs ups and @damemi's response, I think that we're all aligned. I think that the remaining things that we need to close on are:
Does that make sense to everyone else? Did I miss anything? |
I didn’t quite extract the full question, but how will the integration—beyond just the code—fit into the architecture of the collector? Specifically, how will the integration of Beyla (if any) and the OpenTelemetry Collector (including contrib modules) take shape? Managing the behavior of collection, processing, and exporting within a single executable, while leveraging the same configuration schema and patterns, offers tremendous advantages. Additionally, using ecosystem components like OpAMP or a supervisor for one stop remote management adds significant value and shouldn’t be overlooked without this integration. |
Thanks @mtwo, I'll work on getting answers on these decisions this week. I'm already in process of doing this for opentelemetry-network, I'll join the profiler SIG and try to get an answer. |
Hi @cforce, I think we can make Beyla (that is the new project) a component of the OpenTelemetry Collector. We've done this already for Grafana Alloy and we have some experience there. Actually, we use the Collector SDK already for traces, so integration should be even less of a problem. One major challenge I see, which we've faced with our Alloy component, is that like all eBPF agents, we do require some elevated permissions. Beyla doesn't need "privileged containers" or CAP_SYS_ADMIN, but depending on what functionality is enabled we might ask for CAP_NET_ADMIN, CAP_NET_RAW, CAP_PTRACE etc. Having these permissions on a locked down daemonset which only does outgoing network requests in a standalone mode is one thing, requiring the OpenTelemetry Collector to run with these privileges is another thing. A CVE on any enabled OpenTelemetry Collector module/component, could potentially be a lot more serious with a collector that is running with elevated permissions. |
We met at KubeCon with @yonch and discussed the overlap and differences between Beyla and what OpenTelemetry eBPF networking provides. While there is quite a bit of overlap, there's certain missing product functionality in Beyla at the moment which the OpenTelemetry eBPF networking project provides:
We discussed also that from the eBPF Agent side perspective, that Beyla's use of CO:RE, libbpf and eBPF-Go greatly simplifies the deployment and the development process of the agent and that if OpenTelemetry eBPF Networking started today as a project, it would likely adopt the same approach. Beyla is also able to use finer grained permissions, which makes the deployment easier from security risk assessments. There are two possible approaches to continue forward:
From the discussion with @yonch, we both prefer if there's no action for |
Hi grcevski , I share your concerns, and from a security perspective, it's indeed better to dedicate a separate executable for sensitive operations. In my opinion, any solution within the ecosystem for an eBPF agent should ensure that OpAMP management is implemented as a mandatory component. Building on existing concepts and capabilities of the collector, a specialized collector with elevated rights could integrate the eBPF collection receiver, processors, OTLP exporter, and OpAMP and auth extensions. This collector would be minimalistic and Additionally, a separate side-by-side collector could run without eBPF components, connecting inbound to the eBPF collector as a process or gateway. This approach allows the hardening of the eBPF collector components to benefit the overall collector builds without reinventing the wheel. Let me know your thoughts! |
FWIW we received this type of feedback from users for opentelemetry-network as well. One (very) large company asked that we separate the collector even further:
This is not to say this architecture is necessary -- neither for donation nor afterwards. Just that if a project wants to appeal to more security-conscious organizations, keeping privileged code small and separate from more complex handling appears prudent. |
I agree, I'm not opposed to implementing this :), I just wanted to point out the potential draw backs. I also think that having the tool that collects the data be in the same executable as the tool that processes and sends the data, without an extra network hop, it extremely efficient. So there's definitely a lot of merit in this approach. |
I reached out to the opentelemetry profiler group on the CNCF Slack and based on the discussion I think the best way forward would be to approach the collaboration in the same manner as collaborating with the opentelemetry-go-autoinstrumentation project. Our instrumentation can greatly benefit from the stack walking capability of the opentelemetry-ebpf-profiler, in a sense that we can attach stack traces when an error happens in a transaction. We also share the need to parse headers and Go data structures to be able to extract the trace context. There is already work ongoing by other community members along the same lines related to opentelemetry-ebpf-profiling, open-telemetry/opentelemetry-ebpf-profiler#192. We'd like to leverage this as well. I think the best way forward is to do nothing around the time of the donation, but then start vendoring the capabilities of the opentelemetry-ebpf-profiler project to expand and common the functionality of the two projects. |
After a bit of back and forth I'd like to propose the name of
Please let me know your thoughts. |
Apologies for the delay, I've been pretty ill for the past two weeks and am just getting to this now. I think that we're good to proceed with the next steps! I'll check with the rest of the GC on Thursday and will report back. |
@grcevski just waiting on the TC to assign a reviewer, and then we should be good to go |
Amazing, thanks so much @mtwo and I hope you are feeling better! |
Description
Grafana Labs would like to offer the donation of Beyla to the OpenTelemetry project.
Beyla is a mature eBPF-based auto-instrumentation tool for OpenTelemetry metrics and traces, for multiple languages and protocols. It enables cluster-wide/system-wide auto-instrumentation of applications without the need for application code/configuration changes or application restarts. To achieve this, Beyla uses a combination of protocol-level instrumentation based on network events and language/runtime-level instrumentation where needed. While Beyla works on bare metal installations, virtual machines, etc., the tool is also fully Kubernetes-aware and can be deployed as a daemonset or as a sidecar. Beyla is used by a number of customers in production, including Grafana Labs itself for the Grafana Cloud hosted offering.
Some of the main uses of Beyla are:
Some of the core features of Beyla include:
Minimal performance/memory overhead. We share all probes and maps among all processes, and since the userspace side of the application is built with Go, it often has much lower overhead for metric and trace generation compared to the OpenTelemetry support for certain programming languages (e.g. interpreted languages).
Benefits to the OpenTelemetry community
Donating Beyla will fill a gap in the overall OpenTelemetry application level instrumentation ecosystem, for applications which use programming languages which are not supported by the OpenTelemetry SDKs, which use proprietary frameworks or use older technologies. We also believe that it will fill in a gap with network level monitoring for the purpose of building solutions for service graphs and connectivity tracking.
This donation has a lot of synergy with the OpenTelemetry Profiling Agent, and we believe that in the future we can create a non-intrusive, generic profiling to TraceID correlation by leveraging the two projects.
Reasons for donation
We at Grafana Labs prefer that customers use the upstream OpenTelemetry SDKs for application level instrumentation, however we often find that certain customers are unable to use the recommended approach because of their current technology use. We built Beyla as an easy way for our customers to get started with OpenTelemetry, while they are in their transition process of upgrading their software, which sometimes takes years. Oftentimes, customers also use binary distributions of software, and are unable to instrument these applications depending on the technology the binaries are built with.
We believe that we are not alone in this need to move customers to OpenTelemetry quicker, where they can’t currently leverage the existing OpenTelemetry ecosystem. This is why we’d like to make this project a community project, where multiple companies can be stakeholders and we can build a better community around it, compared to what Grafana Labs can do alone.
Relation with Other OpenTelemetry Projects
We also see this donation as an opportunity to combine the eBPF based auto-instrumentation OpenTelemetry efforts. Our project borrows parts of the OpenTelemetry Go Auto-Instrumentation project and some of our Beyla maintainers participate in that project too. We’d like to fully merge our work on Go with OpenTelemetry Go Auto-Instrumentation and avoid the double contribution we do at the moment. Beyla’s support for auto-instrumentation goes way beyond Go auto-instrumentation, which is why we are proposing a new project donation. We’d like to fully merge all of our work on Go with the OpenTelemetry Go Auto-Instrumentation project and vendor it in Beyla as an import once the merge is complete. We are also open to combining the Go Auto-Instrumentation project into a new project for out-of-process auto-instrumentation with our donation.
We also see this donation as an opportunity to re-invigorate the OpenTelemetry eBPF Networking project. Beyla includes support for the majority of the functionality of that project, but it’s built with eBPF-Go (libbpf), which means it uses CO-RE and it can be deployed on any kernel without specific kernel builds or deploying compilation toolchain on the target system.
Our development stack is identical to what’s used by OpenTelemetry Go Auto-Instrumentation and the OpenTelemetry eBPF Profiler. Developers on those projects will easily be able to contribute to this project and it will bring all of the OpenTelemetry eBPF tooling at the same level.
Repository
https://github.com/grafana/beyla
Existing usage
Beyla is used by hundreds of users in production, including Grafana Cloud itself. We have a strong open-source community usage, the number of pulls of our Docker image is around 100,000 a month and it has been growing steadily since inception of the project. For example, our Docker image pulls in April of 2024 were around 30,000 a month.
Maintenance
We have 4 full-time maintainers on the project which will move work full-time on the OpenTelemetry project if accepted. We have over 40 contributors on the project, most of which are not Grafana Labs employees or affiliated in any way with Grafana Labs.
Licenses
Apache 2.0 License
Our eBPF probe source is dual licensed with GPL/MIT as per the requirements of the Linux Kernel. This is identical to the approach used by OpenTelemetry Go Auto-Instrumentation and OpenTelemetry Profiler.
Trademarks
The name Beyla currently appears in a number of places in the codebase and is a Grafana Labs Trademark. We are happy to donate the name too, however we understand that it’s not compatible with how OpenTelemetry projects are typically named. We are happy to remove any of these name references when the project is donated, if the name donation is not acceptable.
Other notes
This proposal has been socialized with @MrAlias (maintainer of OpenTelemetry Go Auto Instrumentation) and @atoulme (maintainer of OpenTelemetry eBPF Networking)
The text was updated successfully, but these errors were encountered: