#

lm-evaluation

Here are 5 public repositories matching this topic...

IAAR-Shanghai / xFinder

xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

benchmark regex reliability evaluation dataset gpt phi large-language-models llm open-compass chatglm qwen lm-evaluation llm-as-a-judge llm-as-evaluator xfinder reliable-evaluation key-answer-extraction judge-model

Updated Oct 28, 2024
Python

bethgelab / CiteME

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

lm-evaluation citation-attribution citation-dataset

Updated Oct 22, 2024
Python

hitz-zentroa / latxa

Latxa: An Open Language Model and Evaluation Suite for Basque

evaluation language-model basque huggingface gpt-neox llm lm-evaluation latxa

Updated Jun 11, 2024
Shell

RulinShao / RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

evaluation rag retrieval-augmented-generation lm-evaluation

Updated Oct 14, 2024
Python

SYusupov / LogicGPT

LLM Model: Fine-tuning, Evaluation, Containerization, Deployment, CI/CD Pipeline

deployment ci-cd containerization llm llm-finetuning mistral-7b lm-evaluation

Updated Sep 26, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the lm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lm-evaluation topic, visit your repo's landing page and select "manage topics."