This repository contains the code for the paper "A study on the soundness of closed-ended evaluation of Large Language Models adapted to the Italian language" accepted at CLiC-it 2024.
The tasks directory contains the tasks that were used for evaluation (you should clod the lm-eval-harness library and add them as possible tasks).
The scripts directory contains the scripts that were used for evaluation.
Please note that WWBM (Who Wants to Be a Millionaire?) is not publicly available, therefore its task cannot be executed directly.