Skip to content

Commit

Permalink
7500 -> 2500
Browse files Browse the repository at this point in the history
  • Loading branch information
AdrianM0 authored Aug 26, 2024
1 parent 958c244 commit b5ce7c9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/tex/ms.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
However, we possess only a limited systematic understanding of the reasoning capabilities of LLMs, which would be required to improve models and mitigate potential harms.
Here, we introduce \enquote{\chembench,} an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists.

We curated more than 7,500 question-answer pairs, evaluated leading open and closed-source LLMs, and found that, on average, the best models outperformed the chemists.
We curated more than 2,500 question-answer pairs, evaluated leading open and closed-source LLMs, and found that, on average, the best models outperformed the chemists.
The models, however, struggle with some chemical reasoning tasks that are easy for the human experts and provide overconfident, misleading predictions, such as about chemicals' safety profiles.
Although LLMs demonstrate remarkable proficiency in chemical tasks, further research is critical to enhancing their safety and utility in chemical sciences.
\end{abstract}
Expand Down

0 comments on commit b5ce7c9

Please sign in to comment.