Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Add file for helper metrics for slots #3138

Closed
wants to merge 23 commits into from
Closed

Add file for helper metrics for slots #3138

wants to merge 23 commits into from

Conversation

moyapchen
Copy link
Contributor

@moyapchen moyapchen commented Oct 1, 2020

SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + verify that I don't break existing tasks using this next.

Testing steps
`pytest tests/SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + verify that I don't break existing tasks using this next.

Testing steps
pytest tests/test_slot_metrics.py

Note that I did change the SlotF1Metric class (and its associated test) a bit in the 2nd change in this PR in order to get tests to run properly. (I got assertion errors trying to run the pytest test_slot_f1_metrics.py in its original internal directory so figured it would need to change anyhow.)

@moyapchen
Copy link
Contributor Author

Well, that's a bunch of automatically caught bugs. Time to put this back into draft mode...

@moyapchen moyapchen marked this pull request as ready for review October 1, 2020 06:15
@moyapchen
Copy link
Contributor Author

Putting this ready for review since logic seems good.

Note that running the autoformat.sh script locally either does nothing if I do ./../autoformat.sh from ParlAI/parlai or crashes my terminal window if I do source autoformat.sh (adding a path prefix to autoformat as necessary) from any directory...

parlai/utils/task_dialogue_helpers/metrics.py Outdated Show resolved Hide resolved
# we drop Python 3.6
if other is None:
return self
slot_p = _average_type_sum_helper(self._slot_p, other._slot_p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we get Nones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might if both arguments are Nones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I just don't understand if/when that happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh will add a comment about this to clarify.

Basically if you look at SlotMetrics::__init__(), it's clever about how it aggregates F1 metric components, accumulating the precision and recall values separately. However, this means there might be Nones floating around. (I copied this behavior from the original implementation of SlotF1Metrics in the multiwoz agent.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenroller this is what you had changed because you were occasionally getting Nones for the metric during validation, right?

parlai/utils/task_dialogue_helpers/metrics.py Outdated Show resolved Hide resolved
@stephenroller
Copy link
Contributor

I think autoformat assumes you're running from the top directory. That might be something to fix, but for now, just run from the root of the repo.

…ltiple metrics be added to it

See #3138 for context and use
@moyapchen moyapchen changed the base branch from master to metric_collections October 1, 2020 21:13
…g in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)
Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, given that tests are passing. Will let Stephen approve since he had had questions before (but otherwise happy to do so if needed)

parlai/core/tod.py Outdated Show resolved Hide resolved
parlai/core/tod.py Show resolved Hide resolved
parlai/core/tod.py Show resolved Hide resolved
Base automatically changed from metric_collections to master October 2, 2020 18:24
moyapchen added a commit that referenced this pull request Oct 2, 2020
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
stephenroller and others added 7 commits October 2, 2020 11:33
* Add test for interactive_web

* Spinlock

* Hm.

* Lint.
* Allow missing init opt opts

* Add part of unit test

* Work on unit test

* Test fixes

* Fix second test

* Fix test

* Check obsolete arg does not exist
…ltiple metrics be added to it

See #3138 for context and use
SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + test existing tasks to use this next.
Not quiet sure if returning "nans" for F1 metrics is the best way of going about things, but probably okay as a start. Also assuming the Metrics base class will handle addition correctly.

Added the test case to make sure there weren't any glaring syntax errors with how I implemented SlotMetrics

Nominally should probably add more test cases to validate slot-based domain metrics are working correctly, but that seems lower risk/lower pri and will cover it anyway when I update existing tasks.
Moya Chen added 8 commits October 2, 2020 11:33
`docformatter -i --pre-summary-newline --wrap-descriptions 88 --wrap-summaries 88 --make-summary-multi-line` on the relevant file
…shing so all my mispellings don't go into the commit log...
Noticed while testing on taskmaster that jga was higher than both slot_r and slot_p... put in some prints and turns out it was counting the {} == {} case.

This doesn't really make sense to do in general, though there are some scnearios where not having any slots is the correct response... so put in a flag to compensate.
…addressing comments = adding comments + fixing grammar, mostly).
moyapchen and others added 2 commits October 2, 2020 12:38
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
…ltiple metrics be added to it

See #3138 for context and use
moyapchen pushed a commit that referenced this pull request Oct 2, 2020
…ltiple metrics be added to it

See #3138 for context and use
moyapchen added a commit that referenced this pull request Oct 2, 2020
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
moyapchen pushed a commit that referenced this pull request Oct 2, 2020
…ltiple metrics be added to it

See #3138 for context and use
moyapchen added a commit that referenced this pull request Oct 2, 2020
…3145)

* Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it

See #3138 for context and use

* right, having different arguments for the same function aren't a thing in python...

(alas, that's what I get for mostly coding in C++ for the past few years. :P)

* fixed a bug while integrating into taskmaster2

* address comments (get rid of separate class, add func to Metrics directly)

* actually do the things the last comment
@moyapchen moyapchen marked this pull request as draft October 2, 2020 19:48
@moyapchen
Copy link
Contributor Author

Converting this to draft mode, cause rebasing + git is doing weird stuff.

moyapchen pushed a commit that referenced this pull request Oct 2, 2020
…ltiple metrics be added to it

See #3138 for context and use
@moyapchen
Copy link
Contributor Author

Abandoning this; will do internally and sync once it's all nice.

@moyapchen moyapchen closed this Oct 2, 2020
@moyapchen moyapchen deleted the slot_teachers branch October 2, 2020 20:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants