Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Verbose evals #1558

Merged
merged 33 commits into from
Oct 6, 2023
Merged

feat: Verbose evals #1558

merged 33 commits into from
Oct 6, 2023

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Oct 4, 2023

Adds a verbose flag that can be passed to the evals functions, llm_generate and llm_eval_binary. When set, these functions will print informative messages to stdout.

For this PR we're adding verbose logging to these parts:

  • the tenacity wrappers to indicate when model calls fail and must be retried
  • the BaseEvalModel base class
  • model-specific messages that show invocation parameters in both OpenAI and VertexAI implementations
  • additional messages indicating the status of snapping LLM evals to rails

For example:

Generating responses for 4 prompts...
OpenAI invocation parameters: {'model': 'gpt-4', 'temperature': 0.0, 'max_tokens': 256, 'frequency_penalty': 0, 'presence_penalty': 0, 'top_p': 1, 'n': 1, 'request_timeout': None}
Snapping 4 responses to rails: {'relevant', 'irrelevant'}
- Snapped 'relevant' to rail: relevant
- Snapped 'irrelevant' to rail: irrelevant
- Snapped '\nrelevant ' to rail: relevant
- Cannot snap 'unparsable' to rails: {'relevant', 'irrelevant'}

closes #1480

@anticorrelator anticorrelator changed the title feature: Verbose evals feat: Verbose evals Oct 4, 2023
@dataclass
class BaseEvalModel(ABC):
_verbose: bool = False

def retry(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rough attempt to clean up an abstractions: the create_base_retry_decorator function was imported from the base module into both of our concrete implementations. After attaching a verbose state to the model it also meant we needed to feed a property on the base model back into this function so I'm moving the decorator directly into model as an instance method. Please let me know if this feels unpleasant.

I also think it also makes sense if we use the factory directly as a decorator

@@ -120,6 +125,7 @@ def run_relevance_eval(
be parsed.
"""

model._verbose = verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully like the way a field is getting set this way because it's going to be inevitable that this flag gets left set - seems like it would be cleaner to parameterize as a kwarg on certain calls? That way there's no magic. There are going to be other code paths that will need to set this and doing it via parameters seems more scalable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah, that's a good point. If we want to be able to ask the model if it should emit verbose messages maybe we can hold the state in a context manager? Passing the arguments around is also a fine idea but I feel sometimes a flag that's passed around a lot of times can be really confusing to keep track of.

Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests!

Comment on lines 3 to 5
def printif(condition: bool, *args, **kwargs):
if condition:
print(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeldking What are your thoughts on the eventual packaging strategy for evals and other Phoenix sub-modules such as our tracers? Are we going to deploy them as distinct packages, e.g., arize-evals or phoenix-evals? If so, we should be careful about introducing dependencies between the sub-modules and the rest of the codebase.

@anticorrelator This is a non-blocking comment. We can always move things if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it ideally doesn't sit in phoenix long term so treating it more as a sub-module could be a benefit. I think the verbose logging ask could be evals specific so it could make more sense sitting under evals, though I think this is a trivial change if we do split it so not concerned either way

@anticorrelator anticorrelator marked this pull request as ready for review October 5, 2023 05:36
pyproject.toml Show resolved Hide resolved
@@ -47,18 +48,22 @@ def llm_eval_binary(

system_instruction (Optional[str], optional): An optional system message.

verbose (bool, optional): If True, prints detailed info to stdout. Default False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Give an example of what kind of information is being printed, e.g., prompts and prompt templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., If True, prints detailed information including invocation parameters, formatted prompts, etc., to std out.

src/phoenix/experimental/evals/functions/binary.py Outdated Show resolved Hide resolved
src/phoenix/experimental/evals/functions/binary.py Outdated Show resolved Hide resolved
src/phoenix/experimental/evals/functions/binary.py Outdated Show resolved Hide resolved
src/phoenix/experimental/evals/models/base.py Show resolved Hide resolved
src/phoenix/experimental/evals/models/base.py Outdated Show resolved Hide resolved
Comment on lines +213 to +214
else:
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}")
printif(verbose, f"- Snapped {repr(string)} to rail: {rail}")

@anticorrelator anticorrelator merged commit 50e765b into main Oct 6, 2023
9 checks passed
@anticorrelator anticorrelator deleted the dustin/verbose-evals branch October 6, 2023 20:00
@github-actions github-actions bot locked and limited conversation to collaborators Oct 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[evals][logging] verbose logging of the evals function calls
4 participants