-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Verbose evals #1558
feat: Verbose evals #1558
Conversation
- less noisy - do not print OpenAI credentials information
@dataclass | ||
class BaseEvalModel(ABC): | ||
_verbose: bool = False | ||
|
||
def retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rough attempt to clean up an abstractions: the create_base_retry_decorator
function was imported from the base
module into both of our concrete implementations. After attaching a verbose
state to the model it also meant we needed to feed a property on the base model back into this function so I'm moving the decorator directly into model as an instance method. Please let me know if this feels unpleasant.
I also think it also makes sense if we use the factory directly as a decorator
@@ -120,6 +125,7 @@ def run_relevance_eval( | |||
be parsed. | |||
""" | |||
|
|||
model._verbose = verbose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully like the way a field is getting set this way because it's going to be inevitable that this flag gets left set - seems like it would be cleaner to parameterize as a kwarg on certain calls? That way there's no magic. There are going to be other code paths that will need to set this and doing it via parameters seems more scalable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, that's a good point. If we want to be able to ask the model if it should emit verbose messages maybe we can hold the state in a context manager? Passing the arguments around is also a fine idea but I feel sometimes a flag that's passed around a lot of times can be really confusing to keep track of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding tests!
src/phoenix/utilities/logging.py
Outdated
def printif(condition: bool, *args, **kwargs): | ||
if condition: | ||
print(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeldking What are your thoughts on the eventual packaging strategy for evals and other Phoenix sub-modules such as our tracers? Are we going to deploy them as distinct packages, e.g., arize-evals
or phoenix-evals
? If so, we should be careful about introducing dependencies between the sub-modules and the rest of the codebase.
@anticorrelator This is a non-blocking comment. We can always move things if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it ideally doesn't sit in phoenix long term so treating it more as a sub-module could be a benefit. I think the verbose logging ask could be evals specific so it could make more sense sitting under evals, though I think this is a trivial change if we do split it so not concerned either way
@@ -47,18 +48,22 @@ def llm_eval_binary( | |||
|
|||
system_instruction (Optional[str], optional): An optional system message. | |||
|
|||
verbose (bool, optional): If True, prints detailed info to stdout. Default False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Give an example of what kind of information is being printed, e.g., prompts and prompt templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., If True, prints detailed information including invocation parameters, formatted prompts, etc., to std out.
else: | ||
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | |
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}") | |
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}") |
Co-authored-by: Roger Yang <80478925+RogerHYang@users.noreply.github.com>
Adds a
verbose
flag that can be passed to theevals
functions,llm_generate
andllm_eval_binary
. When set, these functions will print informative messages tostdout
.For this PR we're adding verbose logging to these parts:
tenacity
wrappers to indicate when model calls fail and must be retriedBaseEvalModel
base classOpenAI
andVertexAI
implementationsFor example:
closes #1480