-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Safety] Fix a Static Task bug and Safety README #3612
Conversation
projects/safety_recipes/README.md
Outdated
python projects/safety_recipes/human_safety_evaluation/format_safety_ready.py --world-logs-path tmp/world_logs.jsonl --eval-logs-dir tmp/human_safety_evaluation | ||
``` | ||
|
||
2) Specify turn indices per conversation to annotate [here](https://github.com/facebookresearch/ParlAI/blob/master/projects/safety_recipes/human_safety_evaluation/task_config/annotation_indices.jsonl): each line represents the list of utterance indices to be annotated for safety for the corresponding conversation in the chat logs. For bot adversarial test set consisting of 180 examples, we only evaluate the last reply of each conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this part necessary if we use the command above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if yes, can we make it automated? if no, can we make it clear that it's not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes. I just edit this ( running format_safety_ready.py
should automatically generate the annotation_indices.jsonl
as well as the task_data.jsonl
)
with PathManager.open(world_logs_path) as data_file: | ||
for l in data_file.readlines(): | ||
episode = json.loads(l.strip()) | ||
# TODO: when conversation format is finished please remove this line; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there is a bug in conversation format that would generate lines in the world_logs as following:
{"dialog": [[{"batch_padding": true, "episode_done": true, "id": "bot_adversarial_dialogue:HumanSafetyEvaluation.persona_False_flatten_False"}, {"id": "TransformerGenerator", "episode_done": false}]], "context": [], "metadata_path": "tmp/world_logs.metadata"}
I added a hack to skip those when parsing but, there is room for removing that hack after the bug above is fixed.
6410807
to
da18ff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ready to go, but let's wait to merge until the aforementioned bug is fix so we can get rid of that comment/hack and rebase on top
parlai/utils/world_logging.py
Outdated
@@ -74,12 +75,17 @@ def _add_msgs(self, acts, idx=0): | |||
""" | |||
msgs = [] | |||
for act in acts: | |||
# padding examples in the episode[0] | |||
if isinstance(act, Message) and act.is_padding(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only filter out if act is Message
otherwise it'll break the unittests for act
is dict
This pr is rebased on #3674. (separate the changes on safety test and world log saving) |
Patch description
Patch on safety human safety
responseField !==null
, which is alwaystrue
given the default value forresponseField
isFalse
), add a fix to it.bs > 1
Patch on world logging (moved to #3674), this branch is rebased on the feature branch
convo_log_pad
in #3674message.is_padding()
when write episodes toself._current_episodes
.tests/test_eval_model.py
ontest_save_report
.Testing steps
Other information