Unable to reproduce the performance on the llamaQuestions dataset as reported in the paper. #159

UltraEval · 2024-11-22T10:47:23Z

The paper

Unable to Reproduce Moshi's Performance on llamaQuestions Dataset

I attempted to replicate Moshi's performance on the llamaQuestions dataset as reported in the paper, but achieved only 13%.

Testing Methodology:

Request for Clarification:

Could you please release the test script for LlamaQuestions?

Observation:

If the script used is https://github.com/kyutai-labs/moshi/blob/main/scripts/moshi_benchmark.py, it does improve performance to 55%. However, it seems unusual to perform inference 10 times for same question.

Attached is my testing process file.

UltraEval added the question Further information is requested label Nov 22, 2024

Provide feedback