Open-sourcing SM-Turn and SM-Dialog from the human eval paper #4333

EricMichaelSmith · 2022-01-28T16:31:11Z

Patch description

Open-sourcing the single-model technique from the human eval comparison paper: this technique runs the evaluations for SM-Turn and SM-Dialog in that work. A sample YAML file, the model opt file and onboarding examples used in the paper, and an analysis script are also released.

We also modify the existing model chat task (in parlai/crowdsourcing/tasks/model_chat/worlds.py) to fix an error when loading in HITs using Mephisto's DataBrowser.

Testing steps

Commands

To launch local HITs:

python parlai/crowdsourcing/projects/humaneval/single_model_eval/run.py

To launch analysis of HITs:

python parlai/crowdsourcing/projects/humaneval/single_model_eval/analysis/compile_results.py \
--task-name single_model_eval \
--output-folder ${ANALYSIS_SAVE_FOLDER}

Unit tests forthcoming shortly (prioritizing initial code release).

Screenshots

Onboarding:

Conversation flow:

JackUrb

Mephisto stuff looks good to me! Cool to see the ModelChatBlueprint is successfully working in a re-use/extension pattern.

JackUrb · 2022-01-28T16:50:36Z

parlai/crowdsourcing/projects/humaneval/single_model_eval/analysis/compile_results.py

+import parlai.utils.logging as logging
+from parlai.crowdsourcing.tasks.model_chat.model_chat_blueprint import (
+    BLUEPRINT_TYPE,
+)  # noqa: F401  # For registering the blueprint


Annoyingly I think you need to put this comment on line 16 for the lint to successfully ignore this import.

ah good point, just changed

Rebecca-Qian

Looks great! Besides unit tests, should be just a few minor improvements left in next steps.

Rebecca-Qian · 2022-01-31T06:45:21Z

parlai/crowdsourcing/projects/humaneval/single_model_eval/analysis/compile_results.py

+            worker_id = task_unit['worker_id']
+            assignment_id = task_unit['assignment_id']
+
+            # # Determining whether the task unit should be skipped


Nit: Let's stick to one # for comments?

Yeah good point - so, my convention has often been to use two hash symbols for a new section of code: i.e., this comment is indicating that the following 100-ish lines deal with checking whether this task should be skipped. But if this isn't clear to others, then maybe something more obvious should be used. Is there a particular notation that you use for this?

As an aside, often if I find a chunk of code is complex enough to warrant something like that, it's complex enough to move into a nicely-named helper function.

if should_skip_unit(...): continue

I understand that in this scripting setup, this chunk also has some external effects (changing convos counts, doing some data extraction, etc). But in that case, those are critical to its function, and thus the whole code block is doing more than "determining whether the task unit should be skipped". Overall feels a little like an anti-pattern to me.

Yes - @JackUrb definitely agreed that the original phrasing of that comment was incomplete. I've just removed that comment entirely to avoid confusion. Thanks!

Rebecca-Qian · 2022-01-31T07:48:55Z

parlai/crowdsourcing/projects/humaneval/single_model_eval/hydra_configs/conf/example.yaml

+    conversation_start_mode: 'hi'
+    annotation_question: "Please answer the following:"
+    conversations_needed_string: "blender_90M:10"
+    final_rating_question: "Please rate how much you'd prefer to talk to your partner for a long conversation. (1: Would not at all prefer, 5: Would very much prefer)|Please rate how human your partner sounds. (1: Very inhuman, 5: Very human)|Please rate how interesting your partner is. (1: Very boring, 5: Very interesting)"


Can we wrap this line?

Yeah, good call - fixed this and then re-tested to make sure the questions still render properly

EricMichaelSmith and others added 7 commits January 19, 2022 19:55

Add SM-Turn code

8f6fcf8

Add in new good chat data folder

d1d93ef

Model chat temp fix

8f3d75a

Fix duplicated title

62b9a07

Work on README

6743d0e

README revisions

5d05032

Revisions

1ffa871

EricMichaelSmith requested review from stephenroller, JackUrb, jaseweston, Rebecca-Qian and ylannb January 28, 2022 16:31

facebook-github-bot added the CLA Signed label Jan 28, 2022

EricMichaelSmith added 2 commits January 28, 2022 11:32

Create __init__.py

e98c4f2

Create __init__.py

ce0a72b

JackUrb reviewed Jan 28, 2022

View reviewed changes

EricMichaelSmith added 2 commits January 28, 2022 12:39

Merge branch 'main' into smturn

482ac12

Fix Lint error

67e4772

Rebecca-Qian approved these changes Jan 31, 2022

View reviewed changes

EricMichaelSmith added 6 commits January 31, 2022 09:36

Comment

67563bc

Break up line

d83b3f7

Add Bibtex

db95e1a

Make comment strings more useful

9421793

Clarify remaining meta-comments

d0c5691

Update tests

0530893

EricMichaelSmith merged commit 3d15cea into main Feb 2, 2022

EricMichaelSmith deleted the smturn branch February 2, 2022 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-sourcing SM-Turn and SM-Dialog from the human eval paper #4333

Open-sourcing SM-Turn and SM-Dialog from the human eval paper #4333

EricMichaelSmith commented Jan 28, 2022

JackUrb left a comment

JackUrb Jan 28, 2022

EricMichaelSmith Jan 28, 2022

Rebecca-Qian left a comment

Rebecca-Qian Jan 31, 2022

EricMichaelSmith Jan 31, 2022

JackUrb Jan 31, 2022 •

edited

Loading

EricMichaelSmith Jan 31, 2022 •

edited

Loading

Rebecca-Qian Jan 31, 2022

EricMichaelSmith Jan 31, 2022

Open-sourcing SM-Turn and SM-Dialog from the human eval paper #4333

Open-sourcing SM-Turn and SM-Dialog from the human eval paper #4333

Conversation

EricMichaelSmith commented Jan 28, 2022

Patch description

Testing steps

Commands

Screenshots

JackUrb left a comment

Choose a reason for hiding this comment

JackUrb Jan 28, 2022

Choose a reason for hiding this comment

EricMichaelSmith Jan 28, 2022

Choose a reason for hiding this comment

Rebecca-Qian left a comment

Choose a reason for hiding this comment

Rebecca-Qian Jan 31, 2022

Choose a reason for hiding this comment

EricMichaelSmith Jan 31, 2022

Choose a reason for hiding this comment

JackUrb Jan 31, 2022 • edited Loading

Choose a reason for hiding this comment

EricMichaelSmith Jan 31, 2022 • edited Loading

Choose a reason for hiding this comment

Rebecca-Qian Jan 31, 2022

Choose a reason for hiding this comment

EricMichaelSmith Jan 31, 2022

Choose a reason for hiding this comment

JackUrb Jan 31, 2022 •

edited

Loading

EricMichaelSmith Jan 31, 2022 •

edited

Loading