Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with latest code #1

Merged
merged 117 commits into from
Dec 19, 2020
Merged

Merge with latest code #1

merged 117 commits into from
Dec 19, 2020

Conversation

vaibhavad
Copy link
Owner

Patch description

Testing steps

Logs

Other information

Data tests (if applicable)
If you added a new teacher, you will be asked to run
python tests/datatests/test_new_tasks.py. Please paste this log here.

stephenroller and others added 30 commits September 24, 2020 11:05
* Test profile_ scripts

* Add test for distributed_eval

* More scripts.

* Lint.

* Also check -t self_chat

* Whoops, gotta de-init

* Update docstring
Co-authored-by: Diana Rico <dianaglzrico@learnfair0715.h2.fair>
# v0.9.3 Release

Known issues
- Short options like `-m` and `-t` do fail in Python 3.8. Use `--model` and `--task`

Breaking Changes
- A number of old MTurk tasks have been archived and removed from the code (#3085)

New Features
- [image] Detectron feature extraction (#3083)
- [data] Natural questions (#3070)
- [data] TaskMaster-2 (#2678)
- [data] New versions of multiwoz (#3072)
- [distributed] Allow non-tcp based distributed setup (#3095)
- [core] Move torch.load/torch.save to PathManager. (#3094, #3077)
- [mturk] New task on static turn annotations (#3053)
- [mturk] New features in human+model annotation (#3006)
- [core] TorchClassifierAgent now prints its number of prameters (#3086)

Doc Changes:
- New Worlds tutorial (#3049)
- Tutorial on using `-t jsonfile` (#3061)
- Better help message for --init-model (#3090)
- Additions to FAQ (#3073)
- Updated model zoo descriptions for BlenderBot (#3096)

Bug Fixes
- Distributed evaluation now writes to world logs earlier (#3122)
- An argument was updated from store_true to bool (#3113)
- Self-chat now fails loudly with unexpected batchsize (#3081)
- Update drqa default tokenizer away from removed (#3069)
- Using wizard of wikipedia in interactive mode downloads data (#3079)

Developer notes:
- New pre-commit git-secrets (#3106)
- Code coverage improvements (#3110, #3091)
- More reliable tests. (#3108, #3097, #3055)
- Mephisto task dependencies have been updates due to security bugs (#3111, #3101, #3104)
- MTurk config folders are exempt from __init__.py requirements (#3105)
* Start changes to example_script

* Starting to revise blueprint

* Port over blueprint script

* Port over runner script

* Work on example script

* Finish porting over example script

* Minor

* Current time

* Fixes

* README

* Start revising README

* Fix README

* Path fix

* Jack's PR comments

* Comments

* Removing unused code

* Assume block_on_onboarding_fail exists

* Minor

* test_init_everywhere fix
Co-authored-by: Stephen Roller <roller@fb.com>
Note: I'm currently running a quick local training with jga as the
metric to make sure it's functioning properly (cause trying `parlai
eval_model -m bart -t taskmaster2` off the bat has 0s in both `slot_r` and `jga` right now), but figured the metric is easy enough to warrant getting review on it sooner rather than later.
* Test parlai/core/script.py.

* Fix bad return code.

* Reviewer comments.
* Add taskmaster2 command-line arg for single domain

Before this change, `parlai dd -t taskmaster2 --display-verbose` displayed a bunch of sports conversations.

After this change, running `parlai dd -t taskmaster2 --display-verbose --domains music` displays music conversations. Tried on a few other domains to validate; also had a print in the `_load_data()` function.

Also verified that no argument case that all domains were used +
defining `domains` multiple times only used the last one.

* Add taskmaster2 command-line arg for single domain

Before this change, `parlai dd -t taskmaster2 --display-verbose`
displayed a bunch of sports conversations.

After this change, running `parlai dd -t taskmaster2
--display-verbose --domains music` displays a
music.

Also verified that no argument case that all domains were used.
* Add categories

* Help string

* Add category mapper
Add a temporary note to mention the compatibility issue with the static turn-annotations task until the refactor has been tested and merged in
* Fix to make sure folder is always created

* New solution
* Add test for interactive_web

* Spinlock

* Hm.

* Lint.
* Allow missing init opt opts

* Add part of unit test

* Work on unit test

* Test fixes

* Fix second test

* Fix test

* Check obsolete arg does not exist
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
* Add agent code

* Clean up

* Cleanup

* Linting

* Linting

* Fix name
* Dump in readmes

* Update READMEs

* Formatting

* Add __init__

* Another __init__ file

* Add to model_list

* Fix name

* Wording

* Revert paren

* Remove unused flag

* Fix delimiter

* Remove flag
* Add in arXiv links

* Update README.md
* Add ED test to selfchat

* Reviews

Co-authored-by: Diana Rico <dianaglzrico@learnfair0721.h2.fair>
Co-authored-by: Diana Rico <dianaglzrico@learnfair0715.h2.fair>
* black

* minor changes

* black again

* address comments, remove four class flag

* update readme

* black
* Listing quests project

* Moving rl paper to the correct heading

* New heading for LIGHT quests

* Didn't save the merge :(
Co-authored-by: Diana Rico <dianaglzrico@devfair0263.h2.fair>
* add a --version flag to the parlai command #3163

* run autoformatter to fix lint issue

* correct weird typo

* tweak based off PR feedback https://github.com/facebookresearch/ParlAI/pull/3164/files/ff0a5d1d16cdefd111b0723062ee77c165a88683#r500676864

* fix sloppy mixup of argument help descriptions introduced in last commit (sorry!)
stephenroller and others added 29 commits November 10, 2020 19:16
* Support special tokens in non-HF BPE dictionaries.

* Lint.

* Decode implementations.

* Bug fixes.

* Special tokens in hugging_face/gpt2. Reviewer comments.

* Update URLs for reviewers.

* Switch --hf-skip-special-tokens to --skip-special-tokens

* Just kill the option.

* Spelling.

* Elaboration actually.

* Lint.

* Add a test for additional tokens with hugging_face/gpt2.

* Add support for special tokens in re/split/space.

* Add in a slightly harsher test.

* Whoops.
* Implement BPE dropout.

* Only BPE dropout on text, not labels.

* Add a unit test.

* Notes for the future.

* Dictionary save works for slow bytelevel bpe

* Finish adding tests.

* Reviewer coments.

* Rip out unrelated change.
)

* Revert "Revert "[Safety Recipes] Open source Sensitive Topics classifier and data (#3253)" (#3259)"

This reverts commit 1b8bc8c.

* fix build data
* urllib3 and fairseq bumping

* Update metrics.py
* add yelp

* Yelp WIP

* add multitask classifier agent

* multitask model + interactive world

* fix yelp

* fix model list naming
Following up on feedback from fairinternal/ParlAI-Internal#1842

Test plan: ran locally, verified commit gets printed
* init model

* typo
* Add static turn annotations analysis script

* First fixes

* Parentheses bug

* Dump unit test

* Fixes

* Hack together unit test

* Path fixes

* Fixes

* Get test to pass

* Lint

* Lint

* CI issues

* Fix urllib3 more precisely

* Try removing urllib again

* Try known-good urllib3 version

* Flexible requirements

* Easier debugging

* More easy debugging

* Just remove requirements for right now

* Even easier debugging

* Float issues

* Cleaner cleaner debugging

* Don't use self.assertEqual at all

* Sort dataframes

* Fix broken calculation

* Better attempt to compare dfs

* Reset index

* Add reqs back in

* TODO for DataBrowser

* Test tweak
* First shot at refactor.

* Cover more ground.

* Add gpu unittests.

* Add markers for teacher tests.

* Kill quicktests. Add teacher tests.

* Crowdsourcing and mturk tests.

* Fill out the rest.

* Fix link checker.

* Bigger.

* Ordering.

* Fix process and merge.

* Checkpoint.

* Check in regressions.

* Bump requirements.

* Lint.

* Fix code test.

* Update.

* Fix local error.

* No more light genderation, emily must fix.

* Grr, twitter.

* Cut down on the number of tests, speedup.

* Drop the datatest.

* Add documentation on regression tests.

* Reviewer comments.

* Also count num examples and num episodes.

* Fix build.

* Bump cache.

* Fix CornellMovie num examples.
Link to Google Form to request time-limited MMB model weights
* Dump in what I have so far

* Starting work on Meph tests

* Minor

* Remove samples

* Work on static turn annotations unit test

* Formatting file differently

* Minor

* Fixes

* Minor

* Don't check onboarding for now

* Update convos

* Fixes

* Don't have test be mixin

* Abstract away 1-turn tests

* Fixes

* More tests

* Fixes to 3 tasks

* Make tests cleaner

* Remember to build the task

* Test reversion to test

* Revert config.yml

* Update import
* Fix argparse issues in python 3.8

* lint
* [dist] Allow arbitrary sizes for object syncs.

* Spelling
* Refactor existing crowdsourcing end-to-end tasks

* Update import

* Add new files

* Pass in model config directly

* Sample model config

* Update RunScriptConfigs

* Param tweaks

* Clarify task directory var

* Clarify var

* Fix JSON

* Various fixes

* Dump Fast ACUTE test

* Revisions

* Work on unit test

* Try to use data regressions

* Finish prototype

* Various fixes

* Move

* Fix file issue

* Fix tests

* Fix tests

* Fix tests

* Temp tweak

* Fix turn annotations static test

* Temp raise import errors

* Call analysis

* Check analysis inputs

* Various fixes

* Various fixes

* Minor

* Pytest fixtures

* Fix fixture

* Fix Mephisto version

* Bump reqs again

* Partial work on tests

* Clean up fast acute tests

* Add more tests

* Add remaining tests

* Comment out some functions for now

* Don't yield in superclass

* Try to make fast ACUTE code work

* Fix tests

* Temp test to understand why tests aren't working on CI

* Revert "Temp test to understand why tests aren't working on CI"

This reverts commit b51680f.

* Temporarily block the Q-function runs

* Modify fixture

* Run setup/teardown once per function

* Revert "Run setup/teardown once per function"

This reverts commit 9732bd7.

* Now just disable base fast acute

* Revert "Now just disable base fast acute"

This reverts commit 2a3500e.

* Just give time for a worker to be registered

* Waiting for longer before retrying

* Is it about alphabetical order?

* Another rename

* Add back in setup/teardown for chat demo

* Lint

* Remove old ACUTE code

* Revert temp crowdsourcing changes

* Typo

* Cleanup

* Fix dir

* More cleanup

* Tweaks

* Rename variant

* Lint

* PR changes

* TODO for future flags

* Tweak

* More nuanced waiting

* Get example scripts to work

* Fix import

* Add back in dependency

* Defaults fix

* Path tweak

* Analysis tweaks

* Move blueprints to their own file

* Black

* Convenience message

* Don't remove old ACUTE-Eval in this PR

* README notes
#3303)

Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](npm/ini@v1.3.5...v1.3.8)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [ini](https://github.com/isaacs/ini) from 1.3.5 to 1.3.8.
- [Release notes](https://github.com/isaacs/ini/releases)
- [Commits](npm/ini@v1.3.5...v1.3.8)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Remove package-lock files

* .gitignore

* Specific exclusion
* Pass back dot prod

* Pass back attention matrices

* Copy in more-outputs version

* Add in encoder with more outputs

* Layer in model code

* Decoder output

* Start passing back embedding output

* Pass back embedding output from decoder

* Just comment out unused code

* Black

* Fixes

* BART tweaks

* Fixes

* Hack to ignore new projection matrices

* Fix import

* Note fix

* Remove state_dict hack

* Add in distillation agents

* Prototype unit test

* Partial work on unit test

* Try to set up reproducible test

* Fix last set of nums

* Change test path

* Embed inputs in a separate function

* Revert

* Minor

* Add hooks

* Better way of doing hooks

* Starting to use hooks

* Abstract away

* More abstraction

* More abstraction

* Finish using hooks for attention matrices

* Switch to hooks output

* Fix test

* Start to not pass stuff back

* Decoder fixes

* More cleanup

* MHA compatibility

* Fix decoder output

* Generalize flag

* Reversions for BART stuff

* Just circumvent initializing from bart_large

* Manipulate mask

* Share teacher model

* Type fixes and other fixes

* New test

* Partial test with note

* Remove test

* Clear hooks

* Partial README

* Finish README

* Linting

* Lint

* Test fixes

* Test fixes

* Help unit test

* Test tweak

* Fix test

* Note on clamping

* Script for removing projection matrices

* Compatibility with distributed mode

* Minor

* Clean up clamping code

* super() fix

* Comment

* PR comments

* PR comments

* Split into 2 separate distillation tests

* Fixed message teacher

* Split out loss checking

* Split tests

* Switch to pytest regressions

* PR comments

* Remove staticmethod

* Fixes

* Lint

* mypy fix

* Fix seed

* Fix seed??

* New test

* fp32 mode

* Cleanup

* Lint
* fix dialogpt dual usage of endoftext

* add null_idx = -1

* dialog bs test

* Set null_idx in model and decoder, add to dialogpt test

* small formats

* accidental delete old test

* reviewer comment
* [release] Bump to v0.10.0.
* Add build files

* README work

* New entries in model list

* README

* Adding sample responses

* Reordering
* Some initial work to the turn annotations task

* Finishing blueprint

* Onboarding works

* Got checkboxes running

* It works up through saving

* Most of frontend pass flow works now

* dropped file

* Getting things working

* TODO

* Mephisto version

* Start work to hook in AgentState

* Minor cleanup

* Add back in tag

* Lots of cleanup

* Fix self.agents

* Pass in task_type

* Partial run stats work

* Print run stats

* Work on saving data

* Chat data folder, and cleanups

* Fixes

* Minor

* Fix imports

* Bump up version

* Annotations fix

* problem_data fix

* Refactor existing crowdsourcing end-to-end tasks

* Update import

* Start to generalize chat test

* More generalizing

* More generalizing

* Finish generalizing

* Abstract away full test

* Add in stub for turn-annotations test

* Specify num agents

* Adding in expected outputs

* More moving stuff around

* Load actual results

* Work on setting up test

* Finish prototype of test

* Fixes so far

* Match up outputs

* Fix chat demo test

* First work on removing old turn annotations code

* Remove remaining turn annotations code

* Minor fix

* Autoformat

* Add exception for test_init_everywhere

* README updates

* Channel issue

* Fix Meph version

* More robust key checking

* Fixes

* Add dependency back

* Space

* Pin to new Mephisto version

* Test compatibility

* Linting

Co-authored-by: EricMichaelSmith <ems@fb.com>
Co-authored-by: Eric Smith <EricMichaelSmith@users.noreply.github.com>
* Refactor existing crowdsourcing end-to-end tasks

* Update import

* Add new files

* Pass in model config directly

* Sample model config

* Update RunScriptConfigs

* Param tweaks

* Clarify task directory var

* Clarify var

* Fix JSON

* Various fixes

* Dump Fast ACUTE test

* Revisions

* Work on unit test

* Try to use data regressions

* Finish prototype

* Various fixes

* Move

* Fix file issue

* Fix tests

* Fix tests

* Fix tests

* Temp tweak

* Fix turn annotations static test

* Temp raise import errors

* Call analysis

* Check analysis inputs

* Various fixes

* Various fixes

* Minor

* Pytest fixtures

* Fix fixture

* Fix Mephisto version

* Bump reqs again

* Partial work on tests

* Clean up fast acute tests

* Add more tests

* Add remaining tests

* Comment out some functions for now

* Don't yield in superclass

* Try to make fast ACUTE code work

* Fix tests

* Temp test to understand why tests aren't working on CI

* Revert "Temp test to understand why tests aren't working on CI"

This reverts commit b51680f.

* Temporarily block the Q-function runs

* Modify fixture

* Run setup/teardown once per function

* Revert "Run setup/teardown once per function"

This reverts commit 9732bd7.

* Now just disable base fast acute

* Revert "Now just disable base fast acute"

This reverts commit 2a3500e.

* Just give time for a worker to be registered

* Waiting for longer before retrying

* Is it about alphabetical order?

* Another rename

* Add back in setup/teardown for chat demo

* Lint

* Remove old ACUTE code

* Revert temp crowdsourcing changes

* Typo

* Cleanup

* Fix dir

* More cleanup

* Tweaks

* Rename variant

* Lint

* PR changes

* TODO for future flags

* Tweak

* More nuanced waiting

* Get example scripts to work

* Fix import

* Add back in dependency

* Defaults fix

* Path tweak

* Analysis tweaks

* Move blueprints to their own file

* Black

* Convenience message

* Don't remove old ACUTE-Eval in this PR

* README notes

* Analyze multiple runs

* Better path

* Fixes

* Update flag

* Check eval question
@vaibhavad vaibhavad merged commit 106290b into vaibhavad:master Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.