Skip to content

Commit

Permalink
update exp
Browse files Browse the repository at this point in the history
  • Loading branch information
Ziyan Jiang committed Jun 20, 2024
1 parent 5e03eb1 commit 03fef35
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ <h1 class="title is-1 publication-title">LongRAG: Enhancing Retrieval-Augmented

<centering>
<div style="text-align: center;">
<img id="teaser" width="85%" src="images/teaser.png">
<img id="teaser" width="85%" src="static/images/teaser.png">
</div>
</centering>

Expand All @@ -200,9 +200,9 @@ <h2 class="title is-3">Abstract</h2>
<p>

In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs.
Such a design forces the retriever to search over a large corpus to find the needle unit. In contrast, the readers only need to extract answers from the
Such a design forces the retriever to search over a large corpus to find the "needle" unit. In contrast, the readers only need to extract answers from the
short retrieved units. Such an imbalanced heavy retriever and light reader design can lead to sub-optimal performance. In order to alleviate the imbalance,
we propose a new framework LongRAG, consisting of a long retriever and a long reader. LongRAG processes the entire Wikipedia into 4K-token units, which is
we propose a new framework LongRAG, consisting of a "long retriever" and a "long reader". LongRAG processes the entire Wikipedia into 4K-token units, which is
30x longer than before. By increasing the unit size, we significantly reduce the total units from 22M to 700K. This significantly lowers the burden of retriever,
which leads to a remarkable retrieval score: answer recall@1=71% on NQ (previously 52%) and answer recall@2=72% (previously 47%) on HotpotQA (full-wiki). Then
we feed the top-k retrieved units (≈ 30K tokens) to an existing long-context LLM to perform zero-shot answer extraction. Without requiring any training, LongRAG
Expand All @@ -219,7 +219,15 @@ <h2 class="title is-3">Abstract</h2>
</div>
</section>

<section class="section">
<!-- Results. -->
<div class="columns is-centered has-text-centered">
<div class="column is-six-fifths">
<h2 class="title is-3"><img id="painting_icon" width="3%" src="https://cdn-icons-png.flaticon.com/512/3515/3515174.png"> Performance</h2>
</div>
</div>

</section>



Expand Down

0 comments on commit 03fef35

Please sign in to comment.