Generalize model chat analysis code #3844

EricMichaelSmith · 2021-07-22T19:02:08Z

Patch description
Generalize the analysis code for the model_chat crowdsourcing task, to allow it to analyze HITs of human+model chats in which we disabled either personas or buckets that annotate problems with the model's responses. This will allow this code to be more easily subclassed for analyzing other human+model chat tasks that don't use personas and/or annotation buckets.

Minor changes:

Remove brittle and redundant code to render results as an HTML table
Add a new method to the analysis class, ._add_additional_per_turn_stats(), to allow for subclasses to add additional stats to the output dataframe
Fix the default annotation buckets by adding a "None, all good" bucket, which is necessary for the analysis code to be able to analyze annotation bucket results
Add in CI checks to test running analysis without personas or buckets

(Apologies for the very big PR! The bulk of the pertinent changes are in parlai/crowdsourcing/tasks/model_chat/analysis/compile_results.py, and most of the other changes modify sample inputs/output files for CI checks)

Testing steps
pytest tests/crowdsourcing/tasks/model_chat/test_model_chat_analysis.py

JackUrb

I think there's room to discuss alternatives to the problem buckets flag. I don't have anything immediately in mind, but perhaps after the weekend.

JackUrb · 2021-07-23T22:23:50Z

parlai/crowdsourcing/tasks/model_chat/analysis/compile_results.py

+                if self.use_problem_buckets:
+                    dialog_has_problems = False


As a note, I'm finding the numerous portions that are gated by use_problem_buckets throughout to be somewhat hard to follow. In this case, we're even creating variable to be used by other scopes only if the attribute is set.

It could be that the overall compile_results() code is rather monolithic, so I have a hard time seeing where it could be broken down or have the self.use_problem_buckets idea pulled out.

Not necessarily blocking, especially if you believe this to be the only type of branching that could occur in these analysis scripts, but worth discussing.

Yes, I agree that the gating of functionality with self.use_problem_buckets adds complexity to an already very complex method, .compile_results(): it's not ideal to do this, but in my mind I feel like this is the most straightforward way to achieve this without completely rewriting the method.

Soon I'll be having a larger discussion about how we want to structure all analysis code going forward, so my hope is that, in the medium-term, this code will be overhauled more thoroughly in order to make it less monolithic; thus, I see this PR as a stopgap solution to provide needed functionality. Happy to discuss if you think there is a better stopgap solution for this :)

If you have a brainstorming session planned for that overhaul, then I don't see a need to delay this step.

JackUrb

My only real concern here is addressed above, and is noted as a stop-gap solution to unblock some level of abstraction for this script.

EricMichaelSmith and others added 13 commits July 20, 2021 19:57

Start to fix some analysis issues

1aba896

Dealing with static turn annotations

11a18d0

Start to disable some problem stuff

87bceda

Rename var

48fe3d1

Work on fixes

22bbfa9

Add more bucket flags

e171cf4

Update worker columns

d966d96

Hook for new function

3bcc2a4

Add in passthrough method

d073bac

Fix

458fd3d

Add none bucket

a6c4d91

Merge branch 'master' into model-chat-analysis-fix

777a7d5

Cosmetic

76ea4a0

facebook-github-bot added the CLA Signed label Jul 22, 2021

EricMichaelSmith and others added 11 commits July 22, 2021 19:17

Fix test

83ee471

Update test

11827a9

Parametrize test

3c3ed52

Dump in new cases

cb4a3cc

Start of new test

8a99e72

Finish new tests

5e94741

Merge branch 'master' into model-chat-analysis-fix

bf37da7

Merge branch 'master' into model-chat-analysis-fix

2b2c14d

Start revising test

c0efe46

Format JSONs

b738140

More fixes

297adee

EricMichaelSmith changed the title ~~Model chat analysis fix~~ Generalize model chat analysis code Jul 23, 2021

EricMichaelSmith marked this pull request as ready for review July 23, 2021 14:19

EricMichaelSmith requested review from JackUrb, mus1cholic and stephenroller July 23, 2021 14:20

JackUrb reviewed Jul 23, 2021

View reviewed changes

JackUrb approved these changes Jul 26, 2021

View reviewed changes

EricMichaelSmith merged commit e89a77e into master Jul 26, 2021

EricMichaelSmith deleted the model-chat-analysis-fix branch July 26, 2021 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize model chat analysis code #3844

Generalize model chat analysis code #3844

EricMichaelSmith commented Jul 22, 2021 •

edited

Loading

JackUrb left a comment

JackUrb Jul 23, 2021

EricMichaelSmith Jul 26, 2021

JackUrb Jul 26, 2021

JackUrb left a comment

Generalize model chat analysis code #3844

Generalize model chat analysis code #3844

Conversation

EricMichaelSmith commented Jul 22, 2021 • edited Loading

JackUrb left a comment

Choose a reason for hiding this comment

JackUrb Jul 23, 2021

Choose a reason for hiding this comment

EricMichaelSmith Jul 26, 2021

Choose a reason for hiding this comment

JackUrb Jul 26, 2021

Choose a reason for hiding this comment

JackUrb left a comment

Choose a reason for hiding this comment

EricMichaelSmith commented Jul 22, 2021 •

edited

Loading