Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation techniques for text generation LLMs #24

Closed
4 tasks done
hemajv opened this issue Jan 15, 2024 · 1 comment
Closed
4 tasks done

Evaluation techniques for text generation LLMs #24

hemajv opened this issue Jan 15, 2024 · 1 comment
Assignees

Comments

@hemajv
Copy link
Collaborator

hemajv commented Jan 15, 2024

We should look into different techniques to evaluate the API documentation generated by the LLMs.

  • Look into evaluation techniques for text generation LLMs and compare them against human analysis
  • Have guardrails in place to ensure that the LLM generated output is in alignment with the product documentation and styling
  • Generate responses that meet the evaluation metrics
  • Quantitative analysis of the different evaluation criteria by conducting experiments with different prompts as inputs to the LLM
@hemajv hemajv self-assigned this Jan 15, 2024
@suppathak
Copy link

Hi Hema,
A way to check how well a model responds is by using certain criteria tailored to individual needs. This is shown here using the OpenAI API, where they judge the model's response using specific criteria. Please take a look.

https://python.langchain.com/docs/guides/evaluation/

I have also added a demo notebook related to it: ptal
https://github.com/suppathak/foundation-models-for-documentation/blob/langeval/notebooks/langchain-evaluation.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants