Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM as Juddge evaluation metric #40

Merged
merged 5 commits into from
Dec 12, 2024
Merged

Conversation

NISH1001
Copy link
Collaborator

@NISH1001 NISH1001 commented Dec 12, 2024

Major Changes

  • Add evalem.nlp.metrics.llm.LLMAsJudgeMetric that uses LLM to evaluate and compare. Can be installed via [llm] namespace.

Minor Changes

  • Add evalem._base.structures.SequenceType to encapsulate Union[list, tuple, set]
  • Add imports to dunder init methods for easy imports of NLP metrics.
  • Upgrade major dependencies

Usage

from evalem.nlp import LLMAsJudgeMetric

model = "ollama/llama3.2:3b"
api_base = "http://localhost:11434/v1"

references = [...]
predictions = [...]

LLMAsJudgeMetric(
    model=MODEL,
    api_base=API_BASE,
    api_key=os.environ.get("OPENAI_API_KEY"),
    # api_key=None,
    n_tries=3,
    prompt=PROMPT,
    debug=True,
).compute(
    references=references,
    predictions=predictions,
)

@NISH1001 NISH1001 merged commit e3a9c46 into develop Dec 12, 2024
2 of 5 checks passed
@NISH1001 NISH1001 deleted the feature/llm-as-judge branch December 12, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant