[RAG] Fix RAG Passage Loading #4199

klshuster · 2021-11-18T23:06:26Z

Patch description
Previously, if we errored out attempting to read a .csv file, we switched to reading passages and splitting by \t manually. However, we were still referencing an old variable from the prior loop, which references an old row. I've updated to point to the new variable.

Testing steps
I discovered this because someone pointed out that their retriever was retrieving the same document for every turn. I looked at the logs and saw that the indexing for 5k documents was taking over 5 minutes; this is way too slow, and usually means that the passage embeddings are too close to eachother. After reproducing the error locally, the passages are correctly loaded and indexing takes around 2 seconds.

Relying on CI for the rest.

fix csv reading

df647ca

klshuster requested review from mojtaba-komeili and jxmsML November 18, 2021 23:06

facebook-github-bot added the CLA Signed label Nov 18, 2021

update readme

bd7b7ec

mojtaba-komeili approved these changes Nov 19, 2021

View reviewed changes

klshuster merged commit ef794ea into main Nov 19, 2021

klshuster deleted the fix_passage_loading branch November 19, 2021 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RAG] Fix RAG Passage Loading #4199

[RAG] Fix RAG Passage Loading #4199

klshuster commented Nov 18, 2021

[RAG] Fix RAG Passage Loading #4199

[RAG] Fix RAG Passage Loading #4199

Conversation

klshuster commented Nov 18, 2021