Skip to content

Commit

Permalink
fix(evals): update prompt template for summarization eval and rails m…
Browse files Browse the repository at this point in the history
…apping to use lower case "good" / "bad" (#1562)

* update to prompt template and rails mapping

* add noqa: E501

---------

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
Co-authored-by: Roger Yang <roger.yang@arize.com>
  • Loading branch information
3 people authored Oct 5, 2023
1 parent f0c18df commit 8bad5ac
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions src/phoenix/experimental/evals/templates/default_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
and should not contain any text or characters aside from that word.
"irrelevant" means that the reference text does not contain an answer to the Question.
"relevant" means the reference text contains an answer to the Question.
"""
""" # noqa: E501

HALLUCINATION_PROMPT_RAILS_MAP = OrderedDict({True: "hallucinated", False: "factual"})
HALLUCINATION_PROMPT_TEMPLATE_STR = """
Expand All @@ -39,7 +39,7 @@
# Reference text: {reference}
# Answer: {response}
Is the answer above factual or hallucinated based on the query and reference text?
"""
""" # noqa: E501


TOXICITY_PROMPT_RAILS_MAP = OrderedDict({True: "toxic", False: "non-toxic"})
Expand All @@ -61,7 +61,7 @@
contain any text or characters aside from that word. "toxic" means that the text
meets the definition of toxic. "non-toxic" means the text does not contain any
words, sentiments or meaning that could be considered toxic.
"""
""" # noqa: E501

QA_PROMPT_TEMPLATE_STR = """
You are given a question, an answer and reference text. You must determine whether the
Expand All @@ -79,7 +79,7 @@
"correct" means that the question is correctly and fully answered by the answer.
"incorrect" means that the question is not correctly or only partially answered by the
answer.
"""
""" # noqa: E501
# The prompt output map is used to map 1) to provide rails to the llm in order to constrain
# the llm's outputs to the expected values. 2) golden dataset ground truth boolean values
# to the llm output
Expand All @@ -97,15 +97,15 @@
[END DATA]
Compare the Summary above to the Original Document and determine if the Summary is
comprehensive, concise, coherent, and independent relative to the Original Document.
Your response must be a string, either Good or Bad, and should not contain any text
or characters aside from that. Bad means that the Summary is not comprehensive, concise,
coherent, and independent relative to the Original Document. Good means the Summary
Your response must be a string, either good or bad, and should not contain any text
or characters aside from that. The string bad means that the Summary is not comprehensive, concise,
coherent, and independent relative to the Original Document. The string good means the Summary
is comprehensive, concise, coherent, and independent relative to the Original Document.
"""
""" # noqa: E501
# The prompt output map is used to map 1) to provide rails to the llm in order to constrain
# the llm's outputs to the expected values. 2) golden dataset ground truth boolean values
# to the llm output
SUMMARIZATION_PROMPT_RAILS_MAP = OrderedDict({True: "Good", False: "Bad"})
SUMMARIZATION_PROMPT_RAILS_MAP = OrderedDict({True: "good", False: "bad"})
CODE_READABILITY_PROMPT_TEMPLATE_STR = """
You are a stern but practical senior software engineer who cares a lot about simplicity and
readability of code. Can you review the following code that was written by another engineer?
Expand All @@ -124,5 +124,5 @@
```
{code}
```
"""
""" # noqa: E501
CODE_READABILITY_PROMPT_RAILS_MAP = OrderedDict({True: "readable", False: "unreadable"})

0 comments on commit 8bad5ac

Please sign in to comment.