Skip to content

Commit

Permalink
[showyourwork]
Browse files Browse the repository at this point in the history
  • Loading branch information
kjappelbaum committed Mar 31, 2024
1 parent 87f514b commit a4691ff
Show file tree
Hide file tree
Showing 8 changed files with 8 additions and 3 deletions.
2 changes: 1 addition & 1 deletion src/tex/acronymns.tex
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
\newacronym{smiles}{SMILES}{Simplified Molecular Input Line-Entry System}
\newacronym{pca}{PCA}{Principal Component Analysis}
\newacronym{iupac}{IUPAC}{International Union of Pure and Applied Chemistry}
\newacronym{json}{JSON}{JavaScript Object Notation}
\newacronym{json}{JSON}{JavaScript Object Notation}
2 changes: 1 addition & 1 deletion src/tex/appendix.tex
Original file line number Diff line number Diff line change
Expand Up @@ -268,4 +268,4 @@ \subsection{Leaderboard}

\clearpage

\printnoidxglossary[type=\acronymtype, nonumberlist] % https://github.com/tectonic-typesetting/tectonic/issues/704
\printnoidxglossary[type=\acronymtype, nonumberlist] % https://github.com/tectonic-typesetting/tectonic/issues/704
1 change: 1 addition & 0 deletions src/tex/authors.tex
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,4 @@

\affil[\Letter]{\texttt{mail@kjablonka.com}}
\affil[$\star$]{These authors contributed equally.}

1 change: 1 addition & 0 deletions src/tex/ms.tex
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
\begin{document}
\maketitle


\clearpage
\begin{abstract}
Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained.
Expand Down
2 changes: 1 addition & 1 deletion src/tex/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1220,4 +1220,4 @@ @article{yao2022react
eprinttype = {arXiv},
title = {React: Synergizing reasoning and acting in language models},
date = {2022},
}
}
1 change: 1 addition & 0 deletions src/tex/sections/manually_sources_table.tex
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@
& Lab safety quizzes based on various sources && \variable{output/question_count_per_dir/json_file_counts_sci_lab_safety_test.txt} + \variable{output/question_count_per_dir/json_file_counts_lab_safety.txt} + \variable{output/question_count_per_dir/json_file_counts_stolaf.txt} + \variable{output/question_count_per_dir/json_file_counts_chemical_safety_mcq_exam.txt} + \variable{output/question_count_per_dir/json_file_counts_anderson.txt}\\
\bottomrule
\end{xltabular}

1 change: 1 addition & 0 deletions src/tex/sections/parse_check_desc.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
We selected a large, diverse subset of questions (10 per topic for all model reports) and manually investigated where the parsed output does not match the actual answer intended by the model.
We found that for \glstext{mcq} questions, the parsing was accurate in 99.76\% of the cases, while for floating point questions, the parsing was accurate in 99.17\% of the cases.
The models most frequently generating errors are pplx-7b-chat and Mixtral-8x7b.

1 change: 1 addition & 0 deletions src/tex/sections/semi_programatically_sources_table.tex
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@
\bottomrule
\end{xltabular}


0 comments on commit a4691ff

Please sign in to comment.