-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation #4127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request has merge conflicts that must be resolved before it can be merged. @iamemilio please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
e2aabf1 to
990e5ed
Compare
6591240 to
7bf0e84
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @iamemilio please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
7bf0e84 to
ad0eef7
Compare
This change created a standardized way to handle telemetry internally. All custom names that are not a semantic convention are maintained in constants.py. Helper functions to capture custom telemetry data not captured by automatic instrumentation are in helpers.py.
Calls to custom span capture tooling is replaced with calls to the open telemetry library 1:1. No additional modifications were made, and formatting changes can be addressed in follow up PRs.
This change removes all the hand written telemetry machinery that has been replaced in prior changes with open telemetry library calls.
ad0eef7 to
0c442cd
Compare
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-node studio · code · diff
✅ llama-stack-client-go studio · code · diff
⏳ These are partial results; builds are still running. This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
c6fa7da to
cb357dd
Compare
|
Looks good to me. |
13a2960 to
914bf84
Compare
This reverts commit 914bf84.
|
I am noticing that the responses test suite fails often on this PR, and I can't tell if its related to the changes I made or not. I tried not to change the logical outcome of any of the code modified, but I would appreciate if someone more knowledgable about the async logic could take a look and help me on this one. The root cause is a bit lost on me, and the AI's are clueless. |
What does this PR do?
Fixes: #3806
Test Plan
This tracks what telemetry data we care about in Llama Stack currently (no new data), to make sure nothing important got lost in the migration. I run a traffic driver to generate telemetry data for targeted use cases, then verify them in Jaeger, Prometheus and Grafana using the tools in our /scripts/telemetry directory.
Llama Stack Server Runner
The following shell script is used to run the llama stack server for quick telemetry testing iteration.
Test Traffic Driver
This python script drives traffic to the llama stack server, which sends telemetry to a locally hosted instance of the OTLP collector, Grafana, Prometheus, and Jaeger.
Span Data
Inference
Safety
Remote Tool Listing & Execution
Metrics
Observations
Updated Grafana Dashboard
Status
✅ Everything appears to be working and the data we expect is getting captured in the format we expect it.
Follow Ups