Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[BlenderBot2] Fix ObservationEchoRetriever #4428

Merged
merged 4 commits into from
Mar 17, 2022
Merged

Conversation

jxmsML
Copy link
Contributor

@jxmsML jxmsML commented Mar 17, 2022

Patch description
This is to fix some corner cases in ObservationEchoRetriever and BlenderBot2WizIntGoldDocRetrieverFiDAgent:
1.

  • the show_observation_to_echo_retriever is called twice in BlenderBot2WizIntGoldDocRetrieverFiDAgent._set_qyert_vec, which means the self.model_api.retriever.add_retrieve_doc(observation[self._query_key], retrieved_docs) is also called twice, which would mess up with the query → idx → doc mapping in ObservationEchoRetriever, and the same docs might be returned for distinct queries in a single batch.
  • For example when batch size = 2, in add_retrieve_doc: new_idx = 0, _query_ids = {query0 : 0} , _saved_docs={0 : docs0}
  • When the same show_observation_to_echo_retriever is called the 2nd time, new_idx = 1, _query_ids={query0 : 1} , _saved_docs={1 : docs0} ;
  • When the 2nd query in the same batch arrives, new_idx = 1, _query_ids={query0 : 1, query1: 1} , _saved_docs={1: docs1 }. notice that 1: docs0 was overriden. After the 2nd call, new_idx = 2, _query_ids={query0 : 1, query1: 2} , _saved_docs={1 : docs1, 2: docs1} .
  • In this case, the 2 queries share the same docs docs1 since the new_idx for the new query1 isn't unique when created.

Another issue is that in ObservationEchoRetriever: the _query_ids dictionary can grow linearly with the number of unique queries it see during training.

The fix is to

  • create a new new_idx and make sure it's unseen.
  • clear the _query_ids and _saved_doc dictionaries after batch retrieval is done.

Testing steps

Other information

@jxmsML jxmsML requested a review from klshuster March 17, 2022 15:59
@jxmsML jxmsML merged commit f773ef7 into main Mar 17, 2022
@jxmsML jxmsML deleted the obsechoretriever_fix branch March 17, 2022 21:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants