-
Notifications
You must be signed in to change notification settings - Fork 1.2k
test(telemetry): Telemetry Tests #3805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bcb801e
to
d057549
Compare
d057549
to
94bcedd
Compare
da28bd0
to
86ec389
Compare
# Verify token metrics in response | ||
# Note: Llama Stack emits token metrics in the response JSON, not via OTel Metrics API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be in telemetry
tests? should be responses
tests instead?
|
||
pytestmark = pytest.mark.skipif( | ||
os.environ.get("LLAMA_STACK_TEST_STACK_CONFIG_TYPE") == "server", | ||
reason="In-memory telemetry tests only work in library_client mode (server mode runs in separate process)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have the server tested as that's the main production path. We can have the server use in-memory collector and maybe write the collected results to file somehow, which the tests can then load and validate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be in followup PR
8bbdadc
to
862076f
Compare
dd9f6d4
to
d82abf4
Compare
de02a38
to
56d2bd2
Compare
56d2bd2
to
2ca139d
Compare
# Handle both dict and Pydantic model for usage | ||
# This occurs do to the replay system returning a dict for usage, but the client returning a Pydantic model | ||
# TODO: Fix this by making the replay system return a Pydantic model for usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ashwinb
2ca139d
to
73a7af0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! one more nit please address
contains_model_id = True | ||
assert args["model_id"] == text_model_id | ||
|
||
assert contains_model_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearer to store logged_model_id directly in the loop above and assert logged_model_id == text_model_id here
…ma stack initializes
73a7af0
to
4f82002
Compare
if "model_id" in args: | ||
logged_model_id = args["model_id"] | ||
|
||
assert logged_model_id is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is redundant
What does this PR do?
Adds a test and a standardized way to build future tests out for telemetry in llama stack.
Contributes to #3806
Test Plan
This is the test plan 😎