update website

xingyaoww · Oct 1, 2023 · 254acca · 254acca
1 parent 824e1b6
commit 254acca
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 6 deletions.
diff --git a/index.html b/index.html
@@ -163,11 +163,11 @@ <h2 class="title is-3">Abstract</h2>
 
  </p>
 
- <ol>
+ <ul>
  <li>(a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback.</li>
  <li>(b) Better single-turn performance does not guarantee better multi-turn performance.</li>
  <li>(c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.</li>
- </ol>
+ </ul>
 
  <p>
  We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.
@@ -339,10 +339,8 @@ <h3>LLMs' Ability to Leverage Natural Language Feedback</h3>
 
  <li>
  Similar to previous findings, we find that SIFT and RLHF hurt models' ability to
- leverage feedback.
- The results on CodeLLama (except 7B) and LLaMA-2 show that SIFT/RLHF models
- all have
- lower &Delta;feedback and Success Rate (with feedback) compared to their base variants.
+ leverage feedback on CodeLLama (except 7B) and LLaMA-2, as they all have lower &Delta;feedback and Success Rate (with feedback) compared to their base variants.
+ Another two exceptions are Vicuna and Lemur-v1; We speculate using multi-turn conversations (ShareGPT) for SIFT contributes to these two exceptions.
  <br>
  <button class="btn btn-outline-secondary btn-sm inline-vis-button"
  id="visualize-feedback-sr-sift-rlhf">Visualize

diff --git a/website/javascript/feedback_success_rate_vis.js b/website/javascript/feedback_success_rate_vis.js
@@ -333,6 +333,7 @@ document.addEventListener('DOMContentLoaded', function () {
  'LLaMA-2 (70B, RLHF)',
  'LLaMA-2 (7B, Base)',
  'LLaMA-2 (7B, RLHF)',
+ 'Vicuna-v1.5 (7B, SIFT)',
  ]);
  });