-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Doc Day] Totally rewrite Worlds docs. #3049
Conversation
docs/source/tutorial_basic.md
Outdated
:::{note} Advanced Worlds | ||
We also include a few more advanced "container" worlds: in particular, we | ||
include both a BatchWorld and a DyamicBatchWorld. These worlds may be used when | ||
certain options are sent. See the [tutorial\_worlds](Worlds) tutorial to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was supposed to be a link. The link here is not active in my doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, link doesnt wort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We have an automatic link checker in the CI tests. Will fix.
docs/source/tutorial_basic.md
Outdated
:::{note} Advanced Worlds | ||
We also include a few more advanced "container" worlds: in particular, we | ||
include both a BatchWorld and a DyamicBatchWorld. These worlds may be used when | ||
certain options are sent. See the [tutorial\_worlds](Worlds) tutorial to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, link doesnt wort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more comments
docs/source/tutorial_worlds.md
Outdated
## Agent Sharing | ||
|
||
_Agent Sharing_ is the primary mechanism by which we implement batching, model | ||
serving, and many other ParlAI tricks. Agent sharing works by creating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parlAI tricks makes it sound bad.. maybe say it nicer?
docs/source/tutorial_worlds.md
Outdated
serving, and many other ParlAI tricks. Agent sharing works by creating | ||
_clones_ of an Agent. Each clone of an Agent has _shared state_ and | ||
_independent state_. Independent state includes things like the current | ||
dialogue context (conversation history). Shared state includes things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(conversation history, which is different for every batch element)
# this is where we incorporate any shared state | ||
self.model = shared['model'] | ||
|
||
def share(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does clone call this, or should we be calling this after cloning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clone calls this. Will clarify.
docs/source/tutorial_worlds.md
Outdated
operations. The new logic can be encapsulated in this graph: | ||
|
||
<center> | ||
<img src="_static/img/world_batchbasic.png" alt="Parley with batch_act" width="100%"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the image above. Did you mean the other image to go here?
``` | ||
|
||
<center> | ||
<img src="_static/img/world_batchbasic.png" alt="Independent acts" width="100%"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the spell checker line from under the title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a spell check, not what you mean here
self.model_copies.append(self.model.clone()) | ||
|
||
``` | ||
This initialization code is then represented by this graphic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is agent0 the manager (master) here and the rest its workers? Does agent0 itself run the conversations, or is it just handling the clones in the batch?
docs/source/tutorial_worlds.md
Outdated
`--dynamic-batching batchsort`. | ||
::: | ||
|
||
::{note} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:::{not}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant :::{note}
, LOL.
going over. The first batch ends up with exactly 80 words. The batch ends up | ||
with only 51. But we've now reduced the number of batches we need to process | ||
from 4 to only 2! This is the trick of how dynamic batching can provide | ||
[massive speed ups](tutorial_fast). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this has any effect on the performance of the model: since short or long text may contain different topics, this may make gradients more homogenous compared to regular (fully random) batching. Is there any reference that shows this does not happen? Could you add that here, if one exists? Reading this I get the impression that this might be a trade-off between cost of training and accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal tests show that it converged about 2.5x faster with a difference of like 0.05 ppl worse.
It's actually a little more complex than presented here. Namely, for fp16 reasons, we treat all examples the same length up to a divisor of 8. So we actually get a fair amount of stochasticity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about replacing images with dot diagrams? With the style changes that we discussed.
I didn't think the dot diagrams were complete yet. I'd rather merge this and then have a separate PR for the dot diagrams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying some of the explanations, particularly on the cloning process.
* master: (30 commits) Prevent MTurk config folders from needing __init__.py (#3105) [dist] Allow non-tcp based distributed setup. (#3095) Move torch.load/torch.save to PathManager. (#3094) Adding package dependabot couldn't resolve (#3104) Test AcceptabilityChecker (#3091) [DOC DAY] ISH OSS static turn annotations (#3053) Fix for flaky DialoGPT test (#3097) Temporary compatibility note (#3098) Update model_list.py (#3096) Adding features into the human+model turn annotation task (#3006) [Doc Day] Totally rewrite Worlds docs. (#3049) list trainable parameters (#3086) TaskMaster-2 (#2678) [tests] Switch to OverfitTeacher for some model tests. (#3055) Change drqa default tokenizer. (#3069) Self-chat batch size assertion error (#3081) Disable zipfile serialization option of torch.save (#3077) Always download data on interactive wizard. (#3079) Two inaugural FAQ questions (#3073) Add multiwoz v2.0 (#3072) ... This is necessary in order to update the branch. It is outdated and because of that a lot of its tests are failing.
# v0.9.3 Release Known issues - Short options like `-m` and `-t` do fail in Python 3.8. Use `--model` and `--task` Breaking Changes - A number of old MTurk tasks have been archived and removed from the code (#3085) New Features - [image] Detectron feature extraction (#3083) - [data] Natural questions (#3070) - [data] TaskMaster-2 (#2678) - [data] New versions of multiwoz (#3072) - [distributed] Allow non-tcp based distributed setup (#3095) - [core] Move torch.load/torch.save to PathManager. (#3094, #3077) - [mturk] New task on static turn annotations (#3053) - [mturk] New features in human+model annotation (#3006) - [core] TorchClassifierAgent now prints its number of prameters (#3086) Doc Changes: - New Worlds tutorial (#3049) - Tutorial on using `-t jsonfile` (#3061) - Better help message for --init-model (#3090) - Additions to FAQ (#3073) - Updated model zoo descriptions for BlenderBot (#3096) Bug Fixes - Distributed evaluation now writes to world logs earlier (#3122) - An argument was updated from store_true to bool (#3113) - Self-chat now fails loudly with unexpected batchsize (#3081) - Update drqa default tokenizer away from removed (#3069) - Using wizard of wikipedia in interactive mode downloads data (#3079) Developer notes: - New pre-commit git-secrets (#3106) - Code coverage improvements (#3110, #3091) - More reliable tests. (#3108, #3097, #3055) - Mephisto task dependencies have been updates due to security bugs (#3111, #3101, #3104) - MTurk config folders are exempt from __init__.py requirements (#3105)
Patch description
I completely rewrote the Worlds docs. They try to focus on a high level explanation of why we have to do sharing and how batching is implemented. It tries to move away from what the user needs to implement, since no one ever writes
batch_act
manually.Testing steps
Local render.