Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Generalize model chat analysis code #3844

Merged
merged 24 commits into from
Jul 26, 2021
Merged

Conversation

EricMichaelSmith
Copy link
Contributor

@EricMichaelSmith EricMichaelSmith commented Jul 22, 2021

Patch description
Generalize the analysis code for the model_chat crowdsourcing task, to allow it to analyze HITs of human+model chats in which we disabled either personas or buckets that annotate problems with the model's responses. This will allow this code to be more easily subclassed for analyzing other human+model chat tasks that don't use personas and/or annotation buckets.

Minor changes:

  • Remove brittle and redundant code to render results as an HTML table
  • Add a new method to the analysis class, ._add_additional_per_turn_stats(), to allow for subclasses to add additional stats to the output dataframe
  • Fix the default annotation buckets by adding a "None, all good" bucket, which is necessary for the analysis code to be able to analyze annotation bucket results
  • Add in CI checks to test running analysis without personas or buckets

(Apologies for the very big PR! The bulk of the pertinent changes are in parlai/crowdsourcing/tasks/model_chat/analysis/compile_results.py, and most of the other changes modify sample inputs/output files for CI checks)

Testing steps
pytest tests/crowdsourcing/tasks/model_chat/test_model_chat_analysis.py

@EricMichaelSmith EricMichaelSmith changed the title Model chat analysis fix Generalize model chat analysis code Jul 23, 2021
@EricMichaelSmith EricMichaelSmith marked this pull request as ready for review July 23, 2021 14:19
Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's room to discuss alternatives to the problem buckets flag. I don't have anything immediately in mind, but perhaps after the weekend.

Comment on lines +286 to +287
if self.use_problem_buckets:
dialog_has_problems = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, I'm finding the numerous portions that are gated by use_problem_buckets throughout to be somewhat hard to follow. In this case, we're even creating variable to be used by other scopes only if the attribute is set.

It could be that the overall compile_results() code is rather monolithic, so I have a hard time seeing where it could be broken down or have the self.use_problem_buckets idea pulled out.

Not necessarily blocking, especially if you believe this to be the only type of branching that could occur in these analysis scripts, but worth discussing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the gating of functionality with self.use_problem_buckets adds complexity to an already very complex method, .compile_results(): it's not ideal to do this, but in my mind I feel like this is the most straightforward way to achieve this without completely rewriting the method.

Soon I'll be having a larger discussion about how we want to structure all analysis code going forward, so my hope is that, in the medium-term, this code will be overhauled more thoroughly in order to make it less monolithic; thus, I see this PR as a stopgap solution to provide needed functionality. Happy to discuss if you think there is a better stopgap solution for this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a brainstorming session planned for that overhaul, then I don't see a need to delay this step.

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only real concern here is addressed above, and is noted as a stop-gap solution to unblock some level of abstraction for this script.

@EricMichaelSmith EricMichaelSmith merged commit e89a77e into master Jul 26, 2021
@EricMichaelSmith EricMichaelSmith deleted the model-chat-analysis-fix branch July 26, 2021 15:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants