Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

refactor taskmaster2 to use tod metrics #3147

Closed
wants to merge 25 commits into from

Conversation

moyapchen
Copy link
Contributor

Tested by running

parlai eval -t taskmaster2 -m <model> -mf <recently trained model on food-ordering> --domains food-ordering movies

and validating that the output made sense.

Moya Chen added 9 commits October 1, 2020 17:23
SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + test existing tasks to use this next.
Not quiet sure if returning "nans" for F1 metrics is the best way of going about things, but probably okay as a start. Also assuming the Metrics base class will handle addition correctly.

Added the test case to make sure there weren't any glaring syntax errors with how I implemented SlotMetrics

Nominally should probably add more test cases to validate slot-based domain metrics are working correctly, but that seems lower risk/lower pri and will cover it anyway when I update existing tasks.
`docformatter -i --pre-summary-newline --wrap-descriptions 88 --wrap-summaries 88 --make-summary-multi-line` on the relevant file
…shing so all my mispellings don't go into the commit log...
Noticed while testing on taskmaster that jga was higher than both slot_r and slot_p... put in some prints and turns out it was counting the {} == {} case.

This doesn't really make sense to do in general, though there are some scnearios where not having any slots is the correct response... so put in a flag to compensate.
@github-actions
Copy link

github-actions bot commented Oct 2, 2020

Your PR contains a change to a task. Please paste the results of the following command into a comment:

python tests/datatests/test_new_tasks.py

Moya Chen added 2 commits October 1, 2020 18:11
Tested by running
```
parlai eval -t taskmaster2 -m <model> -mf <recently trained model on food-ordering> --domains food-ordering movies
```

and validating that the output made sense.
@moyapchen moyapchen force-pushed the taskmaster2_metrics branch from a929bcf to 3edf04c Compare October 2, 2020 01:14
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, very clean. Would approve except for add_metrics

guess=model_response['text'],
labels=labels,
teacher_domains=[domain],
delex_guess=delex_text,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Heads up that delex vs lexicalized May need to be an option in the future, but I'm good with this as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, it's implemented as an Optional inside of NlgMetrics right now, so that should already be fine.

bleu_metric = BleuMetric.compute(delex_text, [delex_label])
self.metrics.add('delex_bleu', bleu_metric)
self.metrics.add(f'{domain}_delex_bleu', bleu_metric)
self.metrics.add_metrics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I like this. I just think there's no implementation of add_metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the "metrics_collections" branch ("slot_teachers" is rebased onto that, while this one is rebased on "slot_teachers").

This may be uh. Me trying to recreate diff stacks from Mecurial when that's not how things work in git land. Maybe the lesson I should be learning here is to not try to mirror diff stacks by using branches on top of branches... 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this is the branch that includes the add_metrics: #3145

(Given Github's UI... next time, perhaps I'll just do all of these changes in one pull request, since it's not super nice on stacked branches.)

Copy link
Contributor

@stephenroller stephenroller Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, stacked diffs aren't a thing in github. Sometimes I do the PRs on top of PRs but generally not.

There's a tool called ghstack that the pytorch people use to emulate fairly successfully, but I don't use it myself.

Moya Chen and others added 11 commits October 2, 2020 12:43
* Add test for interactive_web

* Spinlock

* Hm.

* Lint.
* Allow missing init opt opts

* Add part of unit test

* Work on unit test

* Test fixes

* Fix second test

* Fix test

* Check obsolete arg does not exist
…ltiple metrics be added to it

See #3138 for context and use
…addressing comments = adding comments + fixing grammar, mostly).
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
…ltiple metrics be added to it

See #3138 for context and use
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
@moyapchen moyapchen marked this pull request as draft October 2, 2020 20:19
@moyapchen moyapchen closed this Oct 2, 2020
@moyapchen
Copy link
Contributor Author

Abandoning this; will do internally and sync once it's all nice.

@moyapchen moyapchen deleted the taskmaster2_metrics branch October 2, 2020 20:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants