Dialcrowd #4387

jessicah25 · 2022-03-01T18:21:06Z

Patch description
DialCrowd is a dialogue crowdsourcing toolkit that helps requesters write clear HITs, view and analyze results, and obtain higher-quality data. This integration allows for the requester interface, worker interface, and analysis interface to be integrated into ParlAI so requesters can have access to ParlAI's tools with DialCrowd's tools.

Testing steps
The testing steps can be found in the README; there are 3 components: configuration page, annotation page, quality page. The configuration script should allow a download of a config.json file, to be manually put into the task_config folder. The annotation page should use the config.json file to load a HIT that a worker can fully do. The quality page should use the information pulled from Mephisto for the annotation page to display the results and analysis.

JackUrb

Hi @jessicah25, thanks for putting this forward! It's exciting to see what seems like a significant quality of life and ease of use improvement for dialogue tasks on Mephisto. At the moment though this PR is pretty monolithic and hard to parse. Particularly I'm not sure I follow what the use of the webapp-config and webapp-results are specifically, and can't tell if they're attempting to integrate with mephisto's existing tooling for building out custom review flows or not (if not, I'm happy to help integrate!). It would be great to see some screenshots that break down your usage a little more clearly.

There are also a number of classes that would need to be renamed (primarily contents in your parlai/crowdsourcing/tasks/dialcrowd/dialcrowd_blueprint.py file) as they're overwriting the namespace of classes we use elsewhere.

I can give a more detailed review on the whole PR after I get some more context back here.

parlai/crowdsourcing/tasks/dialcrowd/README.md

JackUrb

I've started to take a pass through the code, but I'm having difficulty disambiguating between required content and old stuff that exists because you were building off of some of our directories as a template. There are some interesting things here in DialCrowd that I really feel like ParlAI users would benefit by having, but it would be hard to maintain this codebase in the current state. I've left some early comments, but I'm going to hold off on a complete pass until things are cleaned up a bit.

parlai/crowdsourcing/tasks/dialcrowd/dialcrowd_blueprint.py

parlai/crowdsourcing/tasks/dialcrowd/run.py

parlai/crowdsourcing/tasks/dialcrowd/run_in_flight_qa.py

JackUrb · 2022-03-02T20:53:20Z

parlai/crowdsourcing/tasks/dialcrowd/task_config/config.json

+  "nUnit": "5",
+  "nUnitDuplicated": 1,
+  "nUnitGolden": 1,
+  "questionCategories": [


And then these are examples for workers to understand the different categories?

parlai/crowdsourcing/tasks/dialcrowd/webapp-config/src/app.jsx

parlai/crowdsourcing/tasks/dialcrowd/webapp-config/src/components/core_components.jsx

parlai/crowdsourcing/tasks/dialcrowd/webapp-config/src/components/error_boundary.jsx

JackUrb · 2022-03-02T20:59:05Z

parlai/crowdsourcing/tasks/dialcrowd/webapp-config/server.js

@@ -0,0 +1,18 @@
+const express = require('express');


This server doesn't have any method to post a completed config. How does the end-user save their config to use with their task? I suppose you're using saveAs to put to someone's local machine, but what if you're routing through ssh + forwarded ports and the local machine is not the one you intend to run a task from?

EricMichaelSmith

Just a few comments, some addressing @JackUrb's questions. But yeah, agreed with Jack, happy to take a deeper pass of this once we get a sense of whether there's a way to avoid this level of duplication of existing code from turn_annotations_static/ :)

parlai/crowdsourcing/tasks/dialcrowd/analysis/compile_results.py

parlai/crowdsourcing/tasks/dialcrowd/run.py

parlai/crowdsourcing/tasks/dialcrowd/run_in_flight_qa.py

github-actions · 2022-04-03T00:08:48Z

This PR has not had activity in 30 days. Closing due to staleness.

jessicah25 · 2022-04-18T02:59:14Z

Hi! Sorry for the delay; but I've cleaned up the code so it hopefully is more clear, and removed a lot of the unneeded materials. Not sure how to reopen this since it got closed.

JackUrb · 2022-04-18T17:08:16Z

The cleanup changes here are hugely appreciated from my perspective, thanks for doing so @jessicah25! I still have a few unresolved questions above (mostly just to better understand the functionality), but overall one last thing that would be greatly appreciated would be including the screenshots from walking through an example into your readme - I find this helps set expectations and (short of having automated testing) makes it easier to know when something isn't quite working the way it was intended.

EricMichaelSmith

Oh, much cleaner, thanks so much for this :) A few more comments, but I think this is a massive step forward already

@JackUrb historically we've always had unit test coverage on everything in parlai/crowdsourcing/tasks/, but I think you said that maybe this isn't as crucial for now because a better solution for Mephisto E2E task testing its on its way? (Or do you think that it's still worth getting tests on this now?)

parlai/crowdsourcing/tasks/dialcrowd/README.md

parlai/crowdsourcing/tasks/dialcrowd/run.py

parlai/crowdsourcing/tasks/dialcrowd/util.py

JackUrb · 2022-04-18T21:17:59Z

a better solution for Mephisto E2E task testing its on its way

Definitely! Though we'd be hard pressed to know exactly what the baseline expected inputs and outputs are without a provided sample. Would definitely appreciate a minimal runtime inputs+outputs included in some /tests/ folder, such that we can extend to automated testing in the future.

jessicah25 · 2022-04-19T19:43:03Z

We were planning to link to our paper, which will be published soon, which has more information about DialCrowd. I can definitely add more information into the README. To address the questions you had above @JackUrb , DialCrowd is split into 3 components which are run separately: the configuration page, the MTurk worker page, and the results page. When we save the configuration file from the configuration page, it shouldn't be an issue of where the file is saved to since it should be on whatever machine the user is accessing ParlAI from. The configuration page and the results page aren't seen by the workers and are only served locally.

jessicah25 · 2022-04-19T20:08:58Z

What kinds of tests would you want to include?

EricMichaelSmith · 2022-04-19T22:10:36Z

What kinds of tests would you want to include?

Hmm take a look at the tests in tests/crowdsourcing/tasks/, for instance tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py - this is what we typically do. It's a test for the backend code to make sure that the right agent state is produced given sample dummy inputs (as mentioned by @JackUrb )

jessicah25 · 2022-04-20T20:57:25Z

I've made the other edits, but I'm working on the tests now, and I had a few questions. Would this test be applicable to the two sections of DialCrowd which are not worker-facing?

jessicah25 · 2022-04-20T21:32:35Z

Just committed my changes! Let me know if I interpreted the tests correctly.

EricMichaelSmith · 2022-04-22T14:24:05Z

Cool, thanks a lot @jessicah25 for adding lots of user-friendly task description in the README. The test looks great! Yeah, this was what I was thinking of for a test, although maybe @JackUrb has other ideas as well.

Hmm it looks like a bunch of CI checks are broken, but it's unclear which of them are because of the changes in this PR, since this fork was split off from the facebookresearch:main branch a few months ago. Would you be able to merge from or rebase onto the HEAD of facebookresearch:main in order to see if the CI checks pass given the latest code in that branch?

jessicah25 · 2022-04-22T18:35:49Z

Yes, let me try to do that now!

EricMichaelSmith · 2022-04-22T19:30:59Z

Yes, let me try to do that now!

Okay great! Hmm, looks like the unittests_38 CI check is failing now with the following error:

E           AssertionError: '__init__.py' not found in ['annotation1.png', 'annotation2.png', 'config1.png', 'config2.png', 'config3.png', 'config4.png', 'config5.png', 'config6.png', 'results1.png', 'results2.png', 'results3.png'] : parlai/crowdsourcing/tasks/dialcrowd/images does not contain __init__.py

However, this should be an easy fix (there just needs to be a __init__.py file added to that folder)

jessicah25 · 2022-04-22T21:28:28Z

It seems like there are other errors, but I'm not sure what is causing them; would you be able to give some insight into it?

EricMichaelSmith · 2022-04-25T14:54:17Z

@jessicah25 Okay, looking at the errors one-by-one:

pre-commit, lint: can you try doing ./autoformat.sh in the base ParlAI folder? That will hopefully fix the linting issues that seem to be breaking these checks. From https://github.com/facebookresearch/ParlAI/runs/6135168591?check_suite_focus=true it's also complaining about some unused import statements, which should be able to be removed
crowdsourcing_tests: @JackUrb do you know what might be causing this issue on the Dialcrowd test? I don't think I've seen this specific issue before (maybe the assertion is new to Mephisto 1.0?)
long_gpu_tests, teacher_tests, unittests_gpu18: these look like issues that exist in main as well currently, so they're not related to this PR

jessicah25 · 2022-04-25T18:07:04Z

Looks like the remaining errors are the ones still existing on main (long_gpu_tests, teacher_tests, unittests_gpu18), and crowdsourcing_tests!

EricMichaelSmith · 2022-04-25T19:45:45Z

Looks like the remaining errors are the ones still existing on main (long_gpu_tests, teacher_tests, unittests_gpu18), and crowdsourcing_tests!

Hmm it looks like most of the linting issues are resolved, but there's still one left (apparently I have to approve in order to run the linting CI check): the error message says it's on line 76 of tests/crowdsourcing/tasks/dialcrowd/test_dialcrowd.py. Can you do pip install black and run it on that script to try to fix that error? Black is sometimes very picky about minor linting issues

jessicah25 · 2022-04-25T20:19:28Z

Hm, I had already run pip install black and fixed one error in test_dialcrowd, but I'm not seeing an error when I run ./autoformat.sh right now. No files are changing (186 were left unchanged).

EricMichaelSmith · 2022-04-25T20:33:11Z

Hm, I had already run pip install black and fixed one error in test_dialcrowd, but I'm not seeing an error when I run ./autoformat.sh right now. No files are changing (186 were left unchanged).

Hmm - what if you add a second empty line above except ImportError: in that script? I notice that there are two above that line in tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py. That's the kind of thing that the linter would complain about

JackUrb · 2022-04-25T20:46:43Z

The crowdsourcing test failure is strange - it's failing to create the agents. Perhaps you need to manually import your specific blueprint type and ensure that the config you're feeding to the test includes this as well?

Also does this test pass locally? I'd expect you may find clearer debug output in your own console if not.

jessicah25 · 2022-04-27T20:51:18Z

Just did that!

EricMichaelSmith · 2022-04-27T21:09:21Z

Just did that!

Great, thanks! Will ask about this now

EricMichaelSmith · 2022-04-27T21:11:04Z

Just did that!

Great, thanks! Will ask about this now

Okay cool, asked the others - will see what they say

EricMichaelSmith · 2022-04-27T23:32:43Z

Just did that!

Great, thanks! Will ask about this now

Okay cool, asked the others - will see what they say

Just heard back - apparently black (i.e. the pre-commit CI check) should be the source of truth on this. Can you try adding an exception to .flake8 to suppress the lint error for this here?

…t check

jessicah25 · 2022-04-28T17:27:04Z

I tried adding test_dialcrowd.py to the exclude files of .flake8, let me know if that was what you were referring to!

EricMichaelSmith · 2022-04-28T17:33:45Z

I tried adding test_dialcrowd.py to the exclude files of .flake8, let me know if that was what you were referring to!

Ha, sorry - realize that wasn't clear =/ Hmm can you try looking up the lint code that corresponds to having 1 space and not 2, and then adding it to extend-ignore in that file? I think that should be a robust fix, because it'll prevent this issue from recurring

…ass'

jessicah25 · 2022-04-28T18:47:23Z

This should be the error code we're looking for:

E305 | expected 2 blank lines after end of function or class

Hopefully this works! (Looks like I have to remove the space as well)

EricMichaelSmith · 2022-04-28T19:04:41Z

This should be the error code we're looking for:

E305 | expected 2 blank lines after end of function or class

Hopefully this works! (Looks like I have to remove the space as well)

Ugh this is ridiculous - lint check still fails with the same issue:

[
  {
    path: 'tests/crowdsourcing/tasks/dialcrowd/test_dialcrowd.py',
    start_line: 76,
    end_line: 76,
    start_column: 1,
    end_column: 1,
    annotation_level: 'failure',
    message: '[BLK100] Black would make changes.'
  }
]

Did you say that, when you ran black on your side, it did or didn't flag having only one space? If it did flag it, I wonder if it's possible to re-run it on your end to check that the error code that you added really suppresses this error?

jessicah25 · 2022-04-28T21:45:14Z

Just ran autoformat.sh; it keeps reformatting test_dialcrowd.py to the 2 spaces even with the E305 error suppressed. If I run autoformat.sh -f, which only runs flake8 (without only 1 space), the error doesn't seem to show up.

EricMichaelSmith · 2022-04-28T23:32:03Z

Just ran autoformat.sh; it keeps reformatting test_dialcrowd.py to the 2 spaces even with the E305 error suppressed. If I run autoformat.sh -f, which only runs flake8 (without only 1 space), the error doesn't seem to show up.

Okay, yeah, I've reproduced the error with a new test PR at https://github.com/facebookresearch/ParlAI/pull/4519/files, and I see the same thing. I've tried various perturbations of this line to no avail, so I'm asking around to see if anyone else knows how to interpret that cryptic Black error

jessicah25 · 2022-04-28T23:35:30Z

Sounds good!

stephenroller · 2022-04-29T03:20:15Z

Thanks for so much patience @jessicah25. Seems like we're only nits away. Let's just land this and I'll clean up black next week.

stephenroller · 2022-04-29T03:20:51Z

(I will request a git merge main to see if it resolves all the broken tests)

EricMichaelSmith

Approving - thanks for the patience on this @jessicah25 !

jessicah25 · 2022-04-29T12:38:13Z

Thank you for helping me through all of this @EricMichaelSmith @JackUrb !

EricMichaelSmith · 2022-05-02T20:44:57Z

(I will request a git merge main to see if it resolves all the broken tests)

@jessicah25 Hmm can you try doing a git merge main one last time to see how many tests are resolved before we merge this in? :)

jessicah25 · 2022-05-02T21:29:09Z

I just ran git merge main in my dialcrowd branch; is that what you meant? It's all up to date.

EricMichaelSmith · 2022-05-02T21:35:10Z

I just ran git merge main in my dialcrowd branch; is that what you meant? It's all up to date.

Ah, can you try merging facebookresearch:main into your fork once last time, as you did a few days ago? I'm not sure that I can do it myself on your fork - for instance, I tried following this guide but I can't see "Fetch upstream", implying that I may not have write access to it

jessicah25 · 2022-05-02T22:19:56Z

just did that!

EricMichaelSmith · 2022-05-02T23:55:21Z

Can confirm that, as expected, the lint CI check is the only one that newly fails with this PR. Will merge now

EricMichaelSmith · 2022-07-27T13:41:04Z

Hi @jessicah25 ! I was wondering - do you have any interest in maintaining dialcrowd and keeping it up to date? @JackUrb suggested in #4677 that dialcrowd may not work correctly currently because the frontend uses a pre-v1.0 version of Mephisto.

Otherwise, one option is for us to eventually remove the code itself from the HEAD of main, but leave the dialcrowd folder present, with a README giving instructions to the user for how to use dialcrowd by rewinding main to a known-working commit.

What do you think? Either way would work for us :)

CC @klshuster

jessicah25 · 2022-08-18T04:36:43Z

Hi Eric! I would be willing to upkeep it for a bit; would I reopen another pull request as I'm doing so? I can look at the necessary updates next week.

EricMichaelSmith · 2022-08-18T13:15:33Z

Hi @jessicah25 ! Yes, you could open another pull request to make the changes. Thanks!

facebook-github-bot added the CLA Signed label Mar 1, 2022

JackUrb self-requested a review March 1, 2022 19:05

JackUrb reviewed Mar 1, 2022

View reviewed changes

EricMichaelSmith reviewed Mar 1, 2022

View reviewed changes

parlai/crowdsourcing/tasks/dialcrowd/README.md Show resolved Hide resolved

JackUrb reviewed Mar 2, 2022

View reviewed changes

EricMichaelSmith reviewed Mar 3, 2022

View reviewed changes

github-actions bot added the stale label Apr 3, 2022

github-actions bot closed this Apr 10, 2022

JackUrb reopened this Apr 18, 2022

EricMichaelSmith reviewed Apr 18, 2022

View reviewed changes

github-actions bot removed the stale label Apr 19, 2022

add line

c83a067

suppress lint error for test_dialcrowd.py due to issue with black lin…

5f2cfea

…t check

add exception for 'expected 2 blank lines after end of function or cl…

dfd8085

…ass'

remove space for linter in test_dialcrowd

af39d2b

EricMichaelSmith approved these changes Apr 29, 2022

View reviewed changes

Merge branch 'facebookresearch:main' into dialcrowd

d72ced5

EricMichaelSmith merged commit 98143d6 into facebookresearch:main May 2, 2022

jessicah25 mentioned this pull request Aug 26, 2022

update to mephisto 2.0.1 for DialCrowd #4773

Merged

Dialcrowd #4387

Dialcrowd #4387

Conversation

jessicah25 commented Mar 1, 2022

JackUrb left a comment

Choose a reason for hiding this comment

JackUrb left a comment

Choose a reason for hiding this comment

JackUrb Mar 2, 2022

Choose a reason for hiding this comment

JackUrb Mar 2, 2022

Choose a reason for hiding this comment

EricMichaelSmith left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 3, 2022

jessicah25 commented Apr 18, 2022

JackUrb commented Apr 18, 2022

EricMichaelSmith left a comment

Choose a reason for hiding this comment

JackUrb commented Apr 18, 2022

jessicah25 commented Apr 19, 2022

jessicah25 commented Apr 19, 2022

EricMichaelSmith commented Apr 19, 2022

jessicah25 commented Apr 20, 2022

jessicah25 commented Apr 20, 2022

EricMichaelSmith commented Apr 22, 2022 • edited Loading

jessicah25 commented Apr 22, 2022

EricMichaelSmith commented Apr 22, 2022

jessicah25 commented Apr 22, 2022

EricMichaelSmith commented Apr 25, 2022

jessicah25 commented Apr 25, 2022

EricMichaelSmith commented Apr 25, 2022

jessicah25 commented Apr 25, 2022 • edited Loading

EricMichaelSmith commented Apr 25, 2022 • edited Loading

JackUrb commented Apr 25, 2022

jessicah25 commented Apr 27, 2022

EricMichaelSmith commented Apr 27, 2022

EricMichaelSmith commented Apr 27, 2022

EricMichaelSmith commented Apr 27, 2022

jessicah25 commented Apr 28, 2022

EricMichaelSmith commented Apr 28, 2022

jessicah25 commented Apr 28, 2022 • edited Loading

EricMichaelSmith commented Apr 28, 2022

jessicah25 commented Apr 28, 2022 • edited Loading

EricMichaelSmith commented Apr 28, 2022 • edited Loading

jessicah25 commented Apr 28, 2022

stephenroller commented Apr 29, 2022

stephenroller commented Apr 29, 2022

EricMichaelSmith left a comment

Choose a reason for hiding this comment

jessicah25 commented Apr 29, 2022

EricMichaelSmith commented May 2, 2022

jessicah25 commented May 2, 2022

EricMichaelSmith commented May 2, 2022

jessicah25 commented May 2, 2022

EricMichaelSmith commented May 2, 2022

EricMichaelSmith commented Jul 27, 2022

jessicah25 commented Aug 18, 2022

EricMichaelSmith commented Aug 18, 2022 • edited Loading

EricMichaelSmith commented Apr 22, 2022 •

edited

Loading

jessicah25 commented Apr 25, 2022 •

edited

Loading

EricMichaelSmith commented Apr 25, 2022 •

edited

Loading

jessicah25 commented Apr 28, 2022 •

edited

Loading

jessicah25 commented Apr 28, 2022 •

edited

Loading

EricMichaelSmith commented Apr 28, 2022 •

edited

Loading

EricMichaelSmith commented Aug 18, 2022 •

edited

Loading