-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add file for helper metrics for slots #3138
Conversation
Well, that's a bunch of automatically caught bugs. Time to put this back into draft mode... |
Putting this ready for review since logic seems good. Note that running the |
# we drop Python 3.6 | ||
if other is None: | ||
return self | ||
slot_p = _average_type_sum_helper(self._slot_p, other._slot_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we get Nones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might if both arguments are Nones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I just don't understand if/when that happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh will add a comment about this to clarify.
Basically if you look at SlotMetrics::__init__()
, it's clever about how it aggregates F1 metric components, accumulating the precision and recall values separately. However, this means there might be Nones floating around. (I copied this behavior from the original implementation of SlotF1Metrics in the multiwoz agent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenroller this is what you had changed because you were occasionally getting Nones for the metric during validation, right?
I think autoformat assumes you're running from the top directory. That might be something to fix, but for now, just run from the root of the repo. |
…ltiple metrics be added to it See #3138 for context and use
76a5f54
to
d3b5a19
Compare
…g in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P)
d3b5a19
to
b4479f5
Compare
c79a3a6
to
e3e35b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, given that tests are passing. Will let Stephen approve since he had had questions before (but otherwise happy to do so if needed)
…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment
* Add test for interactive_web * Spinlock * Hm. * Lint.
* Allow missing init opt opts * Add part of unit test * Work on unit test * Test fixes * Fix second test * Fix test * Check obsolete arg does not exist
…ltiple metrics be added to it See #3138 for context and use
SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + test existing tasks to use this next.
Not quiet sure if returning "nans" for F1 metrics is the best way of going about things, but probably okay as a start. Also assuming the Metrics base class will handle addition correctly. Added the test case to make sure there weren't any glaring syntax errors with how I implemented SlotMetrics Nominally should probably add more test cases to validate slot-based domain metrics are working correctly, but that seems lower risk/lower pri and will cover it anyway when I update existing tasks.
`docformatter -i --pre-summary-newline --wrap-descriptions 88 --wrap-summaries 88 --make-summary-multi-line` on the relevant file
…shing so all my mispellings don't go into the commit log...
Noticed while testing on taskmaster that jga was higher than both slot_r and slot_p... put in some prints and turns out it was counting the {} == {} case. This doesn't really make sense to do in general, though there are some scnearios where not having any slots is the correct response... so put in a flag to compensate.
…addressing comments = adding comments + fixing grammar, mostly).
fea599b
to
6b9a0a4
Compare
…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment
…ltiple metrics be added to it See #3138 for context and use
…ltiple metrics be added to it See #3138 for context and use
…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment
…ltiple metrics be added to it See #3138 for context and use
…3145) * Add notion of metrics collections, which can have other Metrics of multiple metrics be added to it See #3138 for context and use * right, having different arguments for the same function aren't a thing in python... (alas, that's what I get for mostly coding in C++ for the past few years. :P) * fixed a bug while integrating into taskmaster2 * address comments (get rid of separate class, add func to Metrics directly) * actually do the things the last comment
Converting this to draft mode, cause rebasing + git is doing weird stuff. |
…ltiple metrics be added to it See #3138 for context and use
Abandoning this; will do internally and sync once it's all nice. |
SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + verify that I don't break existing tasks using this next.
Testing steps
`pytest tests/SlotF1Metric + SlotF1Metric test imported from an internal implementation. Will go through and refactor + verify that I don't break existing tasks using this next.
Testing steps
pytest tests/test_slot_metrics.py
Note that I did change the SlotF1Metric class (and its associated test) a bit in the 2nd change in this PR in order to get tests to run properly. (I got assertion errors trying to run the
pytest test_slot_f1_metrics.py
in its original internal directory so figured it would need to change anyhow.)