Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Clarifications about BlenderBot 2.0 memory managment #3963

Closed
gianlucabusatta opened this issue Aug 20, 2021 · 3 comments
Closed

Clarifications about BlenderBot 2.0 memory managment #3963

gianlucabusatta opened this issue Aug 20, 2021 · 3 comments
Assignees

Comments

@gianlucabusatta
Copy link

gianlucabusatta commented Aug 20, 2021

As stated here the query generator model for BB2 is just a regular BART model trained on the query generation task for Wizard of Internet (and multitasked with the MSC tasks to predict when to access memory).

Looking in here I couldn't find the MSC tasks to predict when to access memory.

Furthermore I have some questions about memory;

  1. As far I understood, it is the query generator who decides when to access the memory and/or search the internet based on the dialog context after each dialog turn; is this mechanism just trained as in the previous statement or are there some hyperparameters? (in the case --knowledge_access_method "classify")
  2. In the case of retrieval-augmentation (without summarization); is the model storing after each dialog turn the context? In which form, raw or encoded by DPR?
  3. How was trained the memory decoder to make it learns what knowledge store?

Thanks in advance for the help.

@klshuster klshuster self-assigned this Aug 20, 2021
@klshuster
Copy link
Contributor

multitasked with the MSC tasks to predict when to access memory

I actually used a custom --mutator to achieve this (see here for some instructions on how to use mutators). PR #3966 adds this mutator to ParlAI

As for your other questions:

  1. The mechanism is trained as in the statement; multitasked to either generate a search query or generate a token indicating access to long term memory
  2. Without summarization, the model encodes "memories" (extracted from the context) and stores them in a pseudo-DPR index (i'm not sure I fully understand this question; does this make sense?)
  3. The memory decoder was trained on --task msc:PersonaSummary; see the project page for more details

Hope that helps

@gianlucabusatta
Copy link
Author

gianlucabusatta commented Aug 22, 2021

Thank you!

For what concern point 2.: are the memories always extracted after each dialog turn, or is there some kind of mechanism to decide wheter the model should store new memories? (always in the case without summarization).

@klshuster
Copy link
Contributor

when not considering summarization, there are a couple heuristic controls for the model to extract memories from the dialogue context:

  1. you can set the --memory-extractor-phrase, which essentially tells the model to only extract memories from the context containing said phrase (during training, for example, this might be --memory-extractor-phrase persona:
  2. if you are using a custom dataset, you can specify the --memory-key, which is the key in the output dataset example dict that contains memories you want the model to write

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants