7500 -> 2500

lamalab-org · Aug 26, 2024 · b5ce7c9 · b5ce7c9
1 parent 958c244
commit b5ce7c9
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/src/tex/ms.tex b/src/tex/ms.tex
@@ -20,7 +20,7 @@
     However, we possess only a limited systematic understanding of the reasoning capabilities of LLMs, which would be required to improve models and mitigate potential harms. 
     Here, we introduce \enquote{\chembench,} an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists.
 
-    We curated more than 7,500 question-answer pairs, evaluated leading open and closed-source LLMs, and found that, on average, the best models outperformed the chemists. 
+    We curated more than 2,500 question-answer pairs, evaluated leading open and closed-source LLMs, and found that, on average, the best models outperformed the chemists. 
     The models, however, struggle with some chemical reasoning tasks that are easy for the human experts and provide overconfident, misleading predictions, such as about chemicals' safety profiles. 
     Although LLMs demonstrate remarkable proficiency in chemical tasks, further research is critical to enhancing their safety and utility in chemical sciences.
 \end{abstract}