Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

SeeKeR #4447

Merged
merged 15 commits into from
Mar 25, 2022
Merged

SeeKeR #4447

merged 15 commits into from
Mar 25, 2022

Conversation

klshuster
Copy link
Contributor

Patch description

Project code for SeeKeR. This PR includes the following:

SeeKeR Agents

SeeKeR Dialogue

  • projects.seeker.agents.seeker:ComboFidAgent: This agent is used to train SeeKeR models, as it is the core agent that handles all of the various functionalities simultaneously.
  • projects.seeker.agents.seeker:SeekerAgent: The SeeKeR agent itself. See the provided opt preset, gen/seeker_dialogue, for an example invocation.

GPT2 SeeKeR

  • projects.seeker.agents.gpt2_seeker:GPT2WithRetrieverAgent: This agent packs in the retrieved documents and input context as one big "prompt" to the language model. Utilizes FidAgent for retriever components
  • projects.seeker.agents.gpt2_seeker:GPT2ComboAgent: Agent that can both retrieve for some contexts, and not retrieve for others. This agent is used to train GPT2 SeeKeR models.
  • projects.seeker.agents.gpt2_seeker:GPT2SeekerAgent: Full SeeKeR agent, that handles search query, knowledge, and response generation

Tasks

  • I've included manually constructed teachers for all 30 task variants used to train the SeeKeR dialogue model. They are found in projects.seeker.agents.tasks.<dialogue/search_query/search_decision/knowledge>

Other

Scripts

  • generate_lm_data.py: Script used for generating the LM data for seeker_lm models.

Opt Presets

  • arch/r2c2_base_3B: architecture for R2C2 3B model
  • arch/r2c2_base_400M: architecture for R2C2 400M model
  • gen/seeker_dialogue: generation parameters for a SeeKeR Dialogue model
  • gen/seeker_lm: generation parameters for a SeeKeR (GPT2) LM

General Task Additions/Fixes

  • Fix the ConvAI2 Normalized teacher to allow the no_cands option
  • Add NaturalQuestionsOpenTeacher

Testing Steps

Included CI to test the most important functionality.

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incredible

projects/seeker/agents/gpt2_seeker.py Outdated Show resolved Hide resolved
projects/seeker/scripts/generate_lm_data.py Outdated Show resolved Hide resolved
)
try:
self.generate_data()
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit lint

@jaseweston jaseweston merged commit 7e45300 into main Mar 25, 2022
@jaseweston jaseweston deleted the seeker branch March 25, 2022 13:21
self.label_vec[batch_id, :-1]
) # type: ignore
for i, doc in enumerate(search_results):
url = doc['url']
Copy link
Contributor

@mojtaba-komeili mojtaba-komeili Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thinking about the variation of the search module used, often they may not return or title. And the KeyError generated here may not be very informative for someone who uses it but doesn't know it's internal. How about checking if all these exist and having an assert that tells user what was expected, or a load warning and then replacing the missing entries with a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not a bad idea; I took this function literally verbatim from the one in parlai/agents/rag/retrievers.py; the only change is line 104 to 106. if we want to change that we should go directly to the function there

#######################################


class IdentityLayer(torch.nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember Spencer had added an IdentityLayer to ParlAI recently. Is that possible to use that one instead of the custom one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that PR is not merged yet actually (#4329) As I noted in the docstring, we do have an IdentityLayer in parlai.utils.torch.IdentityLayer`, but I required custom output here.

# if no padding, docs are assumed to be min doc length long
doc_lens[doc_lens.le(0)] = self.min_doc_len
new_enc_out = enc_out.clone()
# BEFORE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: we had this before and after for FiD because we needed to encode the doc with the input. In this module that only does string operations, is it still a reason for doing this way other than using pre-existing methods for making masks etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well we're actually still doing operations on tensors, which is why it's a bit confusing. This mess of logic is a result of me trying to format a GPT2 decoder-style model into a FiD setup, rather than the other way around

and self.search_decision is SearchDecision.COMPUTE
):
self.search_decision_agent.self_observe(self_message)
self.search_decision_agent.history.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the decision agent need to self_observe with the history getting reset right after?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm calling history.reset directly, so the agent still needs a self observe.

This may be old code though; what i've realized is now I can just call self.search_decision_agent.reset(), which will reset the invariants

best_doc = None
best_doc_idx = None
best_f1 = None
for f1, ind in zip(f1s, inds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name says best doc. Why does it looks like it picks the first doc that is above the threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh because we assume that the docs are retrieved in order of their quality. so presumably the first doc that is above the threshold is in fact the best doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, but the metric used by the search engine might be different by the F1 metric we use here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we're also placing our trust more so in the search engine than in a heuristic word overlap metric

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants