[BlenderBot2] Fix ObservationEchoRetriever #4428

jxmsML · 2022-03-17T15:42:36Z

Patch description
This is to fix some corner cases in ObservationEchoRetriever and BlenderBot2WizIntGoldDocRetrieverFiDAgent:
1.

the show_observation_to_echo_retriever is called twice in BlenderBot2WizIntGoldDocRetrieverFiDAgent._set_qyert_vec, which means the self.model_api.retriever.add_retrieve_doc(observation[self._query_key], retrieved_docs) is also called twice, which would mess up with the query → idx → doc mapping in ObservationEchoRetriever, and the same docs might be returned for distinct queries in a single batch.
For example when batch size = 2, in add_retrieve_doc: new_idx = 0, _query_ids = {query0 : 0} , _saved_docs={0 : docs0}
When the same show_observation_to_echo_retriever is called the 2nd time, new_idx = 1, _query_ids={query0 : 1} , _saved_docs={1 : docs0} ;
When the 2nd query in the same batch arrives, new_idx = 1, _query_ids={query0 : 1, query1: 1} , _saved_docs={1: docs1 }. notice that 1: docs0 was overriden. After the 2nd call, new_idx = 2, _query_ids={query0 : 1, query1: 2} , _saved_docs={1 : docs1, 2: docs1} .
In this case, the 2 queries share the same docs docs1 since the new_idx for the new query1 isn't unique when created.

Another issue is that in ObservationEchoRetriever: the _query_ids dictionary can grow linearly with the number of unique queries it see during training.

The fix is to

create a new new_idx and make sure it's unseen.
clear the _query_ids and _saved_doc dictionaries after batch retrieval is done.

Testing steps

Other information

parlai/agents/rag/retrievers.py

Jing Xu added 2 commits March 17, 2022 08:22

add new counter

a8d0259

bb2 wizintgold doc

7fc52c9

jxmsML requested a review from mojtaba-komeili March 17, 2022 15:42

facebook-github-bot added the CLA Signed label Mar 17, 2022

mojtaba-komeili reviewed Mar 17, 2022

View reviewed changes

parlai/agents/rag/retrievers.py Outdated Show resolved Hide resolved

jxmsML requested a review from klshuster March 17, 2022 15:59

comment

51f220d

mojtaba-komeili approved these changes Mar 17, 2022

View reviewed changes

typo

b69743a

jxmsML merged commit f773ef7 into main Mar 17, 2022

jxmsML deleted the obsechoretriever_fix branch March 17, 2022 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BlenderBot2] Fix ObservationEchoRetriever #4428

[BlenderBot2] Fix ObservationEchoRetriever #4428

jxmsML commented Mar 17, 2022

[BlenderBot2] Fix ObservationEchoRetriever #4428

[BlenderBot2] Fix ObservationEchoRetriever #4428

Conversation

jxmsML commented Mar 17, 2022