-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
parlai/tasks/friends/agents.py
Outdated
speaker = utterance['speaker'] | ||
conversation_id = utterance['conversation_id'] | ||
|
||
if conversation_id not in conversations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out python's defaultdict for simpler implantation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jimmy, it looks good. Just a few comments to make it better:
- try our autoformat.sh script for fixing the lint errors.
- Make sure tests are passing. There are some failing now.
- Add convokit to the installation requirements (maybe that is what causing test failures)
- It's alright to have a single teacher, but maybe add a couple more for convenience use. They may be simple extensions of the current one with minimal changes. For example a teacher for All and another one that is a single charter POV.
One more thing, could you add the list of existing character as another entry to the message? It could be a comma separated list of names. |
Sounds great, all done! See updated PR. Used |
parlai/core/build_data.py
Outdated
@@ -380,6 +380,8 @@ def _unzip(path, fname, delete=True): | |||
PathManager.mkdirs(outpath) | |||
continue | |||
logging.debug(f"Extracting to {outpath}") | |||
if '__MACOSX' in member: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It logically makes sense to submit this as its own patch. Please remove it from this PR and add is as a separate one.
parlai/tasks/friends/build.py
Outdated
|
||
def build(opt): | ||
dpath = os.path.join(opt['datapath'], 'Friends') | ||
version = '1.02' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change this to 1 since it was never published to the public code. But remove your local files to force a rebuild.
You haven't addressed this one yet. |
agent.add_argument( | ||
'--characters', | ||
type=str, | ||
default='Rachel Green,Monica Geller,Phoebe Buffay,Joey Tribbiani,Chandler Bing,Ross Geller', | ||
help='A comma-separated list of characters to train on when `--character` == `All`', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list cli arg
This this what you mean? |
One more thing that forgot asking you to add is sample screenshots that shows the output when you run |
(For the record keeping) we chatted about this in-person. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good thanks. Feel free to merge.
* feat: add friends dataset for multiparty convo * feat: generate examples for all 6 main characters in the friends corpus by default * fix: remove unused file * fix: add testing * fix: add speaker label inside label * feat: add support for data folds; clean up code * fix: skip __MACOS folder for zipped files to avoid exception * fix: formatting with autoformat.sh * feat: add convenience teacher classes * feat: add command line option to specify list of characters * undo changes to build_data.py * cleanup * style fix
Patch description
This patch adds a new task
friends
for generating multiparty conversation training examples from the Friends Corpus on ConvoKit.Example Training Sample:
The teacher supports 3 command line arguments,
--character
,--use-silence-token
, and--use-start-token
.The
--character
option specifies which speaker labels to train on. For example, if 'Rachel Green' is specified, then only sentences uttered by Rachel Green will generate a training sample.The choices for
character
are the main characters from the show, or anAll
option specifying all 6 main characters, excluding supporting cast, should generate training samples. TheAll
option is the default option.The
--uses-silence-token
(default:True
) flag indicates whether a training sample should be generated with a token when it's not the chosen character's turn to speak. For example, the earlier pizza conversation would generate the following training samples:The
--uses-start-token
(default:False
) flag indicates whether a token should be included in the beginning of a conversation. For example, the earlier pizza conversation would generate the following training samples:Whereas when
--use-start-token = False
, the first sentence in a conversation would be skipped and not generate a training sample.There are also two flags
--start-token
and--silence-token
that you can use to specify what special symbols you want to use to represent these tokens. By default, they are__START__
and__SILENCE__
.Testing steps
Expected Output:
Other information
To see the examples generated, run: