Skip to content

Commit

Permalink
update website
Browse files Browse the repository at this point in the history
  • Loading branch information
xingyaoww committed Oct 1, 2023
1 parent 824e1b6 commit 254acca
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 6 deletions.
10 changes: 4 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -163,11 +163,11 @@ <h2 class="title is-3">Abstract</h2>

</p>

<ol>
<ul>
<li>(a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback.</li>
<li>(b) Better single-turn performance does not guarantee better multi-turn performance.</li>
<li>(c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.</li>
</ol>
</ul>

<p>
We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.
Expand Down Expand Up @@ -339,10 +339,8 @@ <h3>LLMs' Ability to Leverage Natural Language Feedback</h3>

<li>
Similar to previous findings, we find that SIFT and RLHF hurt models' ability to
leverage feedback.
The results on CodeLLama (except 7B) and LLaMA-2 show that SIFT/RLHF models
all have
lower &Delta;feedback and Success Rate (with feedback) compared to their base variants.
leverage feedback on CodeLLama (except 7B) and LLaMA-2, as they all have lower &Delta;feedback and Success Rate (with feedback) compared to their base variants.
Another two exceptions are Vicuna and Lemur-v1; We speculate using multi-turn conversations (ShareGPT) for SIFT contributes to these two exceptions.
<br>
<button class="btn btn-outline-secondary btn-sm inline-vis-button"
id="visualize-feedback-sr-sift-rlhf">Visualize
Expand Down
1 change: 1 addition & 0 deletions website/javascript/feedback_success_rate_vis.js
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,7 @@ document.addEventListener('DOMContentLoaded', function () {
'LLaMA-2 (70B, RLHF)',
'LLaMA-2 (7B, Base)',
'LLaMA-2 (7B, RLHF)',
'Vicuna-v1.5 (7B, SIFT)',
]);
});

Expand Down

0 comments on commit 254acca

Please sign in to comment.