Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[Doc Day] Totally rewrite Worlds docs. #3049

Merged
merged 5 commits into from
Sep 17, 2020
Merged

[Doc Day] Totally rewrite Worlds docs. #3049

merged 5 commits into from
Sep 17, 2020

Conversation

stephenroller
Copy link
Contributor

Patch description
I completely rewrote the Worlds docs. They try to focus on a high level explanation of why we have to do sharing and how batching is implemented. It tries to move away from what the user needs to implement, since no one ever writes batch_act manually.

Testing steps
Local render.

:::{note} Advanced Worlds
We also include a few more advanced "container" worlds: in particular, we
include both a BatchWorld and a DyamicBatchWorld. These worlds may be used when
certain options are sent. See the [tutorial\_worlds](Worlds) tutorial to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was supposed to be a link. The link here is not active in my doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, link doesnt wort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We have an automatic link checker in the CI tests. Will fix.

docs/source/tutorial_basic.md Outdated Show resolved Hide resolved
:::{note} Advanced Worlds
We also include a few more advanced "container" worlds: in particular, we
include both a BatchWorld and a DyamicBatchWorld. These worlds may be used when
certain options are sent. See the [tutorial\_worlds](Worlds) tutorial to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, link doesnt wort

docs/source/tutorial_worlds.md Show resolved Hide resolved
docs/source/tutorial_worlds.md Show resolved Hide resolved
Copy link
Contributor

@jaseweston jaseweston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more comments

docs/source/tutorial_worlds.md Show resolved Hide resolved
## Agent Sharing

_Agent Sharing_ is the primary mechanism by which we implement batching, model
serving, and many other ParlAI tricks. Agent sharing works by creating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parlAI tricks makes it sound bad.. maybe say it nicer?

serving, and many other ParlAI tricks. Agent sharing works by creating
_clones_ of an Agent. Each clone of an Agent has _shared state_ and
_independent state_. Independent state includes things like the current
dialogue context (conversation history). Shared state includes things like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(conversation history, which is different for every batch element)

# this is where we incorporate any shared state
self.model = shared['model']

def share(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does clone call this, or should we be calling this after cloning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone calls this. Will clarify.

operations. The new logic can be encapsulated in this graph:

<center>
<img src="_static/img/world_batchbasic.png" alt="Parley with batch_act" width="100%"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the image above. Did you mean the other image to go here?

```

<center>
<img src="_static/img/world_batchbasic.png" alt="Independent acts" width="100%"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the spell checker line from under the title?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a spell check, not what you mean here

self.model_copies.append(self.model.clone())

```
This initialization code is then represented by this graphic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is agent0 the manager (master) here and the rest its workers? Does agent0 itself run the conversations, or is it just handling the clones in the batch?

`--dynamic-batching batchsort`.
:::

::{note}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:::{not}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant :::{note}, LOL.

going over. The first batch ends up with exactly 80 words. The batch ends up
with only 51. But we've now reduced the number of batches we need to process
from 4 to only 2! This is the trick of how dynamic batching can provide
[massive speed ups](tutorial_fast).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this has any effect on the performance of the model: since short or long text may contain different topics, this may make gradients more homogenous compared to regular (fully random) batching. Is there any reference that shows this does not happen? Could you add that here, if one exists? Reading this I get the impression that this might be a trade-off between cost of training and accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal tests show that it converged about 2.5x faster with a difference of like 0.05 ppl worse.

It's actually a little more complex than presented here. Namely, for fp16 reasons, we treat all examples the same length up to a divisor of 8. So we actually get a fair amount of stochasticity.

Copy link
Contributor

@mojtaba-komeili mojtaba-komeili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about replacing images with dot diagrams? With the style changes that we discussed.

@stephenroller
Copy link
Contributor Author

I didn't think the dot diagrams were complete yet. I'd rather merge this and then have a separate PR for the dot diagrams.

Copy link
Contributor

@mojtaba-komeili mojtaba-komeili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying some of the explanations, particularly on the cloning process.

@stephenroller stephenroller merged commit 495e8e2 into master Sep 17, 2020
@stephenroller stephenroller deleted the dynb_tutorial branch September 17, 2020 00:01
mojtaba-komeili pushed a commit that referenced this pull request Sep 23, 2020
* master: (30 commits)
  Prevent MTurk config folders from needing __init__.py (#3105)
  [dist] Allow non-tcp based distributed setup. (#3095)
  Move torch.load/torch.save to PathManager. (#3094)
  Adding package dependabot couldn't resolve (#3104)
  Test AcceptabilityChecker (#3091)
  [DOC DAY] ISH OSS static turn annotations (#3053)
  Fix for flaky DialoGPT test (#3097)
  Temporary compatibility note (#3098)
  Update model_list.py (#3096)
  Adding features into the human+model turn annotation task (#3006)
  [Doc Day] Totally rewrite Worlds docs. (#3049)
  list trainable parameters (#3086)
  TaskMaster-2 (#2678)
  [tests] Switch to OverfitTeacher for some model tests. (#3055)
  Change drqa default tokenizer. (#3069)
  Self-chat batch size assertion error (#3081)
  Disable zipfile serialization option of torch.save (#3077)
  Always download data on interactive wizard. (#3079)
  Two inaugural FAQ questions (#3073)
  Add multiwoz v2.0 (#3072)
  ...
This is necessary in order to update the branch. It is outdated
and because of that a lot of its tests are failing.
stephenroller added a commit that referenced this pull request Sep 28, 2020
# v0.9.3 Release

Known issues
- Short options like `-m` and `-t` do fail in Python 3.8. Use `--model` and `--task`

Breaking Changes
- A number of old MTurk tasks have been archived and removed from the code (#3085)

New Features
- [image] Detectron feature extraction (#3083)
- [data] Natural questions (#3070)
- [data] TaskMaster-2 (#2678)
- [data] New versions of multiwoz (#3072)
- [distributed] Allow non-tcp based distributed setup (#3095)
- [core] Move torch.load/torch.save to PathManager. (#3094, #3077)
- [mturk] New task on static turn annotations (#3053)
- [mturk] New features in human+model annotation (#3006)
- [core] TorchClassifierAgent now prints its number of prameters (#3086)

Doc Changes:
- New Worlds tutorial (#3049)
- Tutorial on using `-t jsonfile` (#3061)
- Better help message for --init-model (#3090)
- Additions to FAQ (#3073)
- Updated model zoo descriptions for BlenderBot (#3096)

Bug Fixes
- Distributed evaluation now writes to world logs earlier (#3122)
- An argument was updated from store_true to bool (#3113)
- Self-chat now fails loudly with unexpected batchsize (#3081)
- Update drqa default tokenizer away from removed (#3069)
- Using wizard of wikipedia in interactive mode downloads data (#3079)

Developer notes:
- New pre-commit git-secrets (#3106)
- Code coverage improvements (#3110, #3091)
- More reliable tests. (#3108, #3097, #3055)
- Mephisto task dependencies have been updates due to security bugs (#3111, #3101, #3104)
- MTurk config folders are exempt from __init__.py requirements (#3105)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants