refactor taskmaster2 to use tod metrics #3147

moyapchen · 2020-10-02T01:04:45Z

Tested by running

parlai eval -t taskmaster2 -m <model> -mf <recently trained model on food-ordering> --domains food-ordering movies

and validating that the output made sense.

SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + test existing tasks to use this next.

Not quiet sure if returning "nans" for F1 metrics is the best way of going about things, but probably okay as a start. Also assuming the Metrics base class will handle addition correctly. Added the test case to make sure there weren't any glaring syntax errors with how I implemented SlotMetrics Nominally should probably add more test cases to validate slot-based domain metrics are working correctly, but that seems lower risk/lower pri and will cover it anyway when I update existing tasks.

`docformatter -i --pre-summary-newline --wrap-descriptions 88 --wrap-summaries 88 --make-summary-multi-line` on the relevant file

…shing so all my mispellings don't go into the commit log...

Noticed while testing on taskmaster that jga was higher than both slot_r and slot_p... put in some prints and turns out it was counting the {} == {} case. This doesn't really make sense to do in general, though there are some scnearios where not having any slots is the correct response... so put in a flag to compensate.

github-actions · 2020-10-02T01:05:02Z

Your PR contains a change to a task. Please paste the results of the following command into a comment:

python tests/datatests/test_new_tasks.py

Tested by running ``` parlai eval -t taskmaster2 -m <model> -mf <recently trained model on food-ordering> --domains food-ordering movies ``` and validating that the output made sense.

stephenroller

I like it, very clean. Would approve except for add_metrics

stephenroller · 2020-10-02T01:18:22Z

parlai/tasks/taskmaster2/agents.py

+                    guess=model_response['text'],
+                    labels=labels,
+                    teacher_domains=[domain],
+                    delex_guess=delex_text,


This is good. Heads up that delex vs lexicalized May need to be an option in the future, but I'm good with this as is.

Ah yeah, it's implemented as an Optional inside of NlgMetrics right now, so that should already be fine.

stephenroller · 2020-10-02T01:22:32Z

parlai/tasks/taskmaster2/agents.py

-            bleu_metric = BleuMetric.compute(delex_text, [delex_label])
-            self.metrics.add('delex_bleu', bleu_metric)
-            self.metrics.add(f'{domain}_delex_bleu', bleu_metric)
+            self.metrics.add_metrics(


Okay I like this. I just think there's no implementation of add_metrics.

It's in the "metrics_collections" branch ("slot_teachers" is rebased onto that, while this one is rebased on "slot_teachers").

This may be uh. Me trying to recreate diff stacks from Mecurial when that's not how things work in git land. Maybe the lesson I should be learning here is to not try to mirror diff stacks by using branches on top of branches... 🙃

For context, this is the branch that includes the add_metrics: #3145

(Given Github's UI... next time, perhaps I'll just do all of these changes in one pull request, since it's not super nice on stacked branches.)

Ah yeah, stacked diffs aren't a thing in github. Sometimes I do the PRs on top of PRs but generally not.

There's a tool called ghstack that the pytorch people use to emulate fairly successfully, but I don't use it myself.

…ctly)

* Add test for interactive_web * Spinlock * Hm. * Lint.

* Allow missing init opt opts * Add part of unit test * Work on unit test * Test fixes * Fix second test * Fix test * Check obsolete arg does not exist

…ltiple metrics be added to it See #3138 for context and use

…ctly)

…addressing comments = adding comments + fixing grammar, mostly).

…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment

…ltiple metrics be added to it See #3138 for context and use

…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment

…ctly)

…ltiple metrics be added to it See #3138 for context and use

moyapchen · 2020-10-02T20:33:37Z

Abandoning this; will do internally and sync once it's all nice.

Moya Chen added 9 commits October 1, 2020 17:23

Add file for helper metrics for slots

ccb8206

SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + test existing tasks to use this next.

Move files + update file comment, as suggested

a4d78f6

Manually run

200bf12

`docformatter -i --pre-summary-newline --wrap-descriptions 88 --wrap-summaries 88 --make-summary-multi-line` on the relevant file

Make sure tests are fine with moves

a0aa66a

Add domain-specific jga

ade1700

Fix spelling. Need to be better about running tests locally before pu…

accf984

…shing so all my mispellings don't go into the commit log...

fixes found while integrating into taskmaster2

e3e35b7

moyapchen requested a review from stephenroller October 2, 2020 01:04

facebook-github-bot added the CLA Signed label Oct 2, 2020

Moya Chen added 2 commits October 1, 2020 18:11

tesssssts

fea599b

refactor taskmaster2 to use tod metrics

3edf04c

Tested by running ``` parlai eval -t taskmaster2 -m <model> -mf <recently trained model on food-ordering> --domains food-ordering movies ``` and validating that the output made sense.

moyapchen force-pushed the taskmaster2_metrics branch from a929bcf to 3edf04c Compare October 2, 2020 01:14

stephenroller reviewed Oct 2, 2020

View reviewed changes

remove unused imports

1652137

stephenroller approved these changes Oct 2, 2020

View reviewed changes

moyapchen force-pushed the slot_teachers branch from fea599b to 6b9a0a4 Compare October 2, 2020 19:31

Moya Chen and others added 11 commits October 2, 2020 12:43

address comments (get rid of separate class, add func to Metrics dire…

93c4802

…ctly)

actually do the things the last comment

732fc99

Add test for interactive_web (#3114)

1121258

* Add test for interactive_web * Spinlock * Hm. * Lint.

Allow missing args when using --init-opt (#3112)

8cb679d

* Allow missing init opt opts * Add part of unit test * Work on unit test * Test fixes * Fix second test * Fix test * Check obsolete arg does not exist

Add notion of metrics collections, which can have other Metrics of mu…

fb39525

…ltiple metrics be added to it See #3138 for context and use

address comments (get rid of separate class, add func to Metrics dire…

02c657f

…ctly)

address comments, Tuple -> List cause the errors were bothering me. (…

39ae714

…addressing comments = adding comments + fixing grammar, mostly).

Add notion of metrics collections, which can have other Metrics of mu…

bef4e64

…ltiple metrics be added to it See #3138 for context and use

address comments (get rid of separate class, add func to Metrics dire…

530615b

…ctly)

Moya Chen added 2 commits October 2, 2020 12:53

Add notion of metrics collections, which can have other Metrics of mu…

0e30f38

…ltiple metrics be added to it See #3138 for context and use

figure out what's going on with rebase errors...

9cedc62

moyapchen marked this pull request as draft October 2, 2020 20:19

moyapchen closed this Oct 2, 2020

moyapchen deleted the taskmaster2_metrics branch October 2, 2020 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor taskmaster2 to use tod metrics #3147

refactor taskmaster2 to use tod metrics #3147

moyapchen commented Oct 2, 2020

github-actions bot commented Oct 2, 2020

stephenroller left a comment

stephenroller Oct 2, 2020

moyapchen Oct 2, 2020

stephenroller Oct 2, 2020

moyapchen Oct 2, 2020

moyapchen Oct 2, 2020

stephenroller Oct 2, 2020 •

edited

Loading

moyapchen commented Oct 2, 2020

refactor taskmaster2 to use tod metrics #3147

refactor taskmaster2 to use tod metrics #3147

Conversation

moyapchen commented Oct 2, 2020

github-actions bot commented Oct 2, 2020

stephenroller left a comment

Choose a reason for hiding this comment

stephenroller Oct 2, 2020

Choose a reason for hiding this comment

moyapchen Oct 2, 2020

Choose a reason for hiding this comment

stephenroller Oct 2, 2020

Choose a reason for hiding this comment

moyapchen Oct 2, 2020

Choose a reason for hiding this comment

moyapchen Oct 2, 2020

Choose a reason for hiding this comment

stephenroller Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

moyapchen commented Oct 2, 2020

stephenroller Oct 2, 2020 •

edited

Loading