Evaluation techniques for text generation LLMs #24

hemajv · 2024-01-15T17:43:43Z

We should look into different techniques to evaluate the API documentation generated by the LLMs.

Look into evaluation techniques for text generation LLMs and compare them against human analysis
Have guardrails in place to ensure that the LLM generated output is in alignment with the product documentation and styling
Generate responses that meet the evaluation metrics
Quantitative analysis of the different evaluation criteria by conducting experiments with different prompts as inputs to the LLM

suppathak · 2024-01-16T01:54:15Z

Hi Hema,
A way to check how well a model responds is by using certain criteria tailored to individual needs. This is shown here using the OpenAI API, where they judge the model's response using specific criteria. Please take a look.

https://python.langchain.com/docs/guides/evaluation/

I have also added a demo notebook related to it: ptal
https://github.com/suppathak/foundation-models-for-documentation/blob/langeval/notebooks/langchain-evaluation.ipynb

hemajv self-assigned this Jan 15, 2024

hemajv mentioned this issue Jan 30, 2024

Add evaluation notebook #31

Merged

hemajv mentioned this issue Feb 14, 2024

Add a separate quantitative eval nb #34

Merged

hemajv closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation techniques for text generation LLMs #24

Evaluation techniques for text generation LLMs #24

hemajv commented Jan 15, 2024 •

edited

Loading

suppathak commented Jan 16, 2024

Evaluation techniques for text generation LLMs #24

Evaluation techniques for text generation LLMs #24

Comments

hemajv commented Jan 15, 2024 • edited Loading

suppathak commented Jan 16, 2024

hemajv commented Jan 15, 2024 •

edited

Loading