Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new extract answer feature #148

Closed
wants to merge 5 commits into from
Closed

Added new extract answer feature #148

wants to merge 5 commits into from

Conversation

whitead
Copy link
Contributor

@whitead whitead commented Dec 11, 2024

Standardizing extraction of answers

@whitead whitead requested a review from sidnarayanan December 11, 2024 17:01
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 11, 2024
@whitead whitead requested a review from jamesbraza December 11, 2024 17:01
@dosubot dosubot bot added the enhancement New feature or request label Dec 11, 2024
Comment on lines +30 to +31
"If the proposed answer is empty, invalid, or ambiguous, "
"return an empty string."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upstream some of the tests from https://github.com/Future-House/paper-qa/blob/v5.8.0/tests/test_litqa.py#L117 to here? I think we should also have "multiple options are matched" mentioned somewhere

I would be nice if paper-qa can just import this function and use it, instead of having its own evaluation LLM prompt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - will do

Comment on lines +108 to +114
try:
from litellm import acompletion
except ImportError as e:
raise ImportError(
"eval_answer requires the 'llm' extra for 'litellm'. Please:"
" `pip install aviary[llm]`."
) from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look at how ToolSelector.__init__ takes acompletion: "Callable[..., Awaitable[ModelResponse]] | None" = None, and apply it as an input arg here?

What this does is let people use local models.

Also, feel free to YAGNI on this one

@@ -22,6 +22,21 @@
"temperature": 0,
}

LLM_EXTRACT_CONFIG = {
"prompt": (
"You are evaluating answers for a test which has fixed options. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some statement that focuses the LLM on the message history?

Otherwise, in paper-qa, we witnessed the LLM using its innate knowledge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no message history here (?) Not sure what you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function is responsible for both (1) extracting a letter that (2) ensuring it matches a multiple choice option.

What we saw in paper-qa was in the case of an empty string answer, the LLM would pull on its innate knowledge and could select the correct multiple choice option.

So for this:

Can you add some statement that focuses the LLM on the message history?

I guess what I should of said was can you add a statement that focuses the LLM on just the proposed: str, and tries to avoid pulling on any innate knowledge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I was smart and didn't put the question into these, so there's no way it could get confused and try to answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we simultaneously posted

There is no question. So I don't see how it would be possible for it to attempt to answer. I don't know what else I could write to make it more clear in the prompt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see, very clever! I follow and you're right

Do you mind documenting that rationale somewhere in the code? Maybe a docstring in extract_answer_llm

Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, there are outstanding comments still

@@ -22,6 +22,21 @@
"temperature": 0,
}

LLM_EXTRACT_CONFIG = {
"prompt": (
"You are evaluating answers for a test which has fixed options. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see, very clever! I follow and you're right

Do you mind documenting that rationale somewhere in the code? Maybe a docstring in extract_answer_llm

@whitead
Copy link
Contributor Author

whitead commented Dec 18, 2024

Hey @jamesbraza - I'm going to close this, because at this point YAGNI. Unless you find a use for it in paper-qa eval stuff

@whitead whitead closed this Dec 18, 2024
@jamesbraza
Copy link
Collaborator

Oh I forgot about this one, I will combine it atop #157 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants