Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Pearlli model chat analysis refactor #4577

Merged
merged 16 commits into from
Jun 10, 2022

Conversation

pearlli98
Copy link
Contributor

@pearlli98 pearlli98 commented Jun 7, 2022

Patch description

  1. refactor crowdsourcing model_chat analysis script to use mephisto databrowser's get_data_from_unit() function instead of loading from a specific location on disk.
  2. Changes on CSV: (1) no longer contains a folder column (2) worker_id is changed to numeric result from task_unit['worker_id'] instead of the string placeholder "--NOT-MTURK-AGENT-x_sandbox"
  3. change unit test to include a dummy class that inherits the ModelChatResultsCompiler class such that its get_task_data() mimics databrowser outputs.
  4. update test data
  5. remove results-folders flag from AbstractTurnAnnotationResultsCompiler and added to the subclasses that still need it (ModelChatResultsCompiler and TurnAnnotationsStaticResultsCompiler)

Testing steps
pytest ~/ParlAI/tests/crowdsourcing/tasks/model_chat/test_model_chat_analysis.py

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks great! Ha, you handled these edge cases and got the tests to pass much more smoothly than what I was worried about :) I have a few comments but nothing too major, just some cleanups

@JackUrb @pringshia This is working towards the goal of eventually getting all of ParlAI's tasks onto DataBrowser and away from a brittle loading of raw outputs from disk

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the revisions!

@pearlli98 pearlli98 merged commit cb7fccd into main Jun 10, 2022
@pearlli98 pearlli98 deleted the pearlli-model-chat-analysis-refactor branch June 10, 2022 19:25
kushalarora pushed a commit that referenced this pull request Jun 15, 2022
* change analysis script to use databrowernstead of lading from disk

* add model chat data for testing purposes

* Revert "add model chat data for testing purposes"

This reverts commit d2a0e7a.

* get rid of re and json imports

* remove start-date flag and fix lint error

* add dummy class to mocks DataBrowser output

* update test results

* add results-folders flag

* add results-folders flag

* remove check for results-folders

* remove results-folders flag

* update test to run without results-folders

* update readme

* move --hit-block-list to subclass, add comments, remove unnecessary initialization

* add protobuf to requirement for tensorboard 2.9.1

* change tensorboard to 2.9.0

Co-authored-by: Pearl Li <pearlli@devfair0420.h2.fair>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants