-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Verbose evals #1558
Merged
Merged
feat: Verbose evals #1558
Changes from 3 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
6f002d2
Add `_verbose` flag to `BaseEvalModel`
anticorrelator b410128
Start adding basic verbose-mode logging
anticorrelator 73c93a0
Add verbose mode to retries
anticorrelator a3e6a1b
Only print when verbose flag is set
anticorrelator b2ecf6d
Refactor verbose mode
anticorrelator 931ce4a
Continue refining verbose mode output
anticorrelator 5bcc0d9
Prefer absolute imports
anticorrelator 778c459
Fix type hint
anticorrelator 9dee5b5
Try to clean up abstractions
anticorrelator feeeda9
Add `printif` utility
anticorrelator e51ab9e
Prefer absolute imports
anticorrelator 6eaf28f
Add `verbose` test for `llm_eval_binary`
anticorrelator d73a211
Test retrying with verbose mode
anticorrelator 398ad79
Test that the "verbose" state does not get persisted
anticorrelator 8d15d6d
Implement verbose flag as a context manager
anticorrelator 50e28bc
Add docstrings
anticorrelator 459e7fb
Merge branch 'main' into dustin/verbose-evals
anticorrelator 5c6ec9d
Improve verbosity statefulness test
anticorrelator 44e2038
Lint imports
anticorrelator dadcc9b
Add blankline
anticorrelator bf93cbc
Shorten docstrings
anticorrelator 1b1bdbf
Enforce formatter settings
anticorrelator 0918108
Appease mypy
anticorrelator d21ca4c
Add verbose flag test for `generate`
anticorrelator d4015b7
Merge branch 'main' into dustin/verbose-evals
anticorrelator 9f863e6
Use better dummy variable name
anticorrelator 1d56240
Update src/phoenix/experimental/evals/models/base.py
anticorrelator 89566b1
Add more details to docstrings
anticorrelator dda49ae
Merge remote-tracking branch 'origin' into dustin/verbose-evals
anticorrelator 6d2ceda
Update bedrock model
anticorrelator 61ad169
Restore missing import
anticorrelator 6cb1c95
Merge branch 'main' into dustin/verbose-evals
anticorrelator 1960d1c
Merge branch 'main' into dustin/verbose-evals
anticorrelator File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# A collection of printing and logging utilities | ||
|
||
def printif(condition: bool, *args, **kwargs): | ||
if condition: | ||
print(*args, **kwargs) | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeldking What are your thoughts on the eventual packaging strategy for evals and other Phoenix sub-modules such as our tracers? Are we going to deploy them as distinct packages, e.g.,
arize-evals
orphoenix-evals
? If so, we should be careful about introducing dependencies between the sub-modules and the rest of the codebase.@anticorrelator This is a non-blocking comment. We can always move things if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it ideally doesn't sit in phoenix long term so treating it more as a sub-module could be a benefit. I think the verbose logging ask could be evals specific so it could make more sense sitting under evals, though I think this is a trivial change if we do split it so not concerned either way