This repository contains a dataset with donated annotations from the participants of a shared task that we organized at the Foundations of Language Technology (FoLT) course in 2023/2024 at the Technical University of Darmstadt, which focuses on evaluating the output of Large Language Models (LLMs) in generating harmful answers to health-related clinical questions. Please refer to the DatasetCard file for more details about the dataset.
📃 paper
Contact person: Yufang Hou
When using our dataset, please cite us with
@inproceedings{yufang-2024-FoLT,
title = {A Course Shared Task on Evaluating LLM Output for Clinical Questions},
author={Yufang Hou, Thy Thy Tran, Doan Nam Long Vu, Yiwen Cao, Kai Li, Lukas Rohde, Iryna Gurevych},
booktitle = {Proceedings of the Sixth Workshop on Teaching NLP},
year = {2024}
}