-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jessicah25, thanks for putting this forward! It's exciting to see what seems like a significant quality of life and ease of use improvement for dialogue tasks on Mephisto. At the moment though this PR is pretty monolithic and hard to parse. Particularly I'm not sure I follow what the use of the webapp-config
and webapp-results
are specifically, and can't tell if they're attempting to integrate with mephisto
's existing tooling for building out custom review flows or not (if not, I'm happy to help integrate!). It would be great to see some screenshots that break down your usage a little more clearly.
There are also a number of classes that would need to be renamed (primarily contents in your parlai/crowdsourcing/tasks/dialcrowd/dialcrowd_blueprint.py
file) as they're overwriting the namespace of classes we use elsewhere.
I can give a more detailed review on the whole PR after I get some more context back here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to take a pass through the code, but I'm having difficulty disambiguating between required content and old stuff that exists because you were building off of some of our directories as a template. There are some interesting things here in DialCrowd
that I really feel like ParlAI users would benefit by having, but it would be hard to maintain this codebase in the current state. I've left some early comments, but I'm going to hold off on a complete pass until things are cleaned up a bit.
"nUnit": "5", | ||
"nUnitDuplicated": 1, | ||
"nUnitGolden": 1, | ||
"questionCategories": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then these are examples for workers to understand the different categories?
parlai/crowdsourcing/tasks/dialcrowd/webapp-config/src/components/core_components.jsx
Outdated
Show resolved
Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/webapp-config/src/components/error_boundary.jsx
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,18 @@ | |||
const express = require('express'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This server doesn't have any method to post a completed config. How does the end-user save their config to use with their task? I suppose you're using saveAs
to put to someone's local machine, but what if you're routing through ssh + forwarded ports and the local machine is not the one you intend to run a task from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, some addressing @JackUrb's questions. But yeah, agreed with Jack, happy to take a deeper pass of this once we get a sense of whether there's a way to avoid this level of duplication of existing code from turn_annotations_static/
:)
parlai/crowdsourcing/tasks/dialcrowd/analysis/compile_results.py
Outdated
Show resolved
Hide resolved
This PR has not had activity in 30 days. Closing due to staleness. |
Hi! Sorry for the delay; but I've cleaned up the code so it hopefully is more clear, and removed a lot of the unneeded materials. Not sure how to reopen this since it got closed. |
The cleanup changes here are hugely appreciated from my perspective, thanks for doing so @jessicah25! I still have a few unresolved questions above (mostly just to better understand the functionality), but overall one last thing that would be greatly appreciated would be including the screenshots from walking through an example into your readme - I find this helps set expectations and (short of having automated testing) makes it easier to know when something isn't quite working the way it was intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, much cleaner, thanks so much for this :) A few more comments, but I think this is a massive step forward already
@JackUrb historically we've always had unit test coverage on everything in parlai/crowdsourcing/tasks/
, but I think you said that maybe this isn't as crucial for now because a better solution for Mephisto E2E task testing its on its way? (Or do you think that it's still worth getting tests on this now?)
Definitely! Though we'd be hard pressed to know exactly what the baseline expected inputs and outputs are without a provided sample. Would definitely appreciate a minimal runtime inputs+outputs included in some |
We were planning to link to our paper, which will be published soon, which has more information about DialCrowd. I can definitely add more information into the README. To address the questions you had above @JackUrb , DialCrowd is split into 3 components which are run separately: the configuration page, the MTurk worker page, and the results page. When we save the configuration file from the configuration page, it shouldn't be an issue of where the file is saved to since it should be on whatever machine the user is accessing ParlAI from. The configuration page and the results page aren't seen by the workers and are only served locally. |
What kinds of tests would you want to include? |
Hmm take a look at the tests in |
I've made the other edits, but I'm working on the tests now, and I had a few questions. Would this test be applicable to the two sections of DialCrowd which are not worker-facing? |
Just committed my changes! Let me know if I interpreted the tests correctly. |
Cool, thanks a lot @jessicah25 for adding lots of user-friendly task description in the README. The test looks great! Yeah, this was what I was thinking of for a test, although maybe @JackUrb has other ideas as well. Hmm it looks like a bunch of CI checks are broken, but it's unclear which of them are because of the changes in this PR, since this fork was split off from the |
Yes, let me try to do that now! |
Okay great! Hmm, looks like the unittests_38 CI check is failing now with the following error:
However, this should be an easy fix (there just needs to be a |
It seems like there are other errors, but I'm not sure what is causing them; would you be able to give some insight into it? |
@jessicah25 Okay, looking at the errors one-by-one:
|
Looks like the remaining errors are the ones still existing on main (long_gpu_tests, teacher_tests, unittests_gpu18), and crowdsourcing_tests! |
Hmm it looks like most of the linting issues are resolved, but there's still one left (apparently I have to approve in order to run the linting CI check): the error message says it's on line 76 of |
Hm, I had already run pip install black and fixed one error in test_dialcrowd, but I'm not seeing an error when I run ./autoformat.sh right now. No files are changing (186 were left unchanged). |
Hmm - what if you add a second empty line above |
The crowdsourcing test failure is strange - it's failing to create the agents. Perhaps you need to manually import your specific blueprint type and ensure that the config you're feeding to the test includes this as well? Also does this test pass locally? I'd expect you may find clearer debug output in your own console if not. |
Just did that! |
Great, thanks! Will ask about this now |
Okay cool, asked the others - will see what they say |
Just heard back - apparently black (i.e. the pre-commit CI check) should be the source of truth on this. Can you try adding an exception to .flake8 to suppress the lint error for this here? |
I tried adding test_dialcrowd.py to the exclude files of .flake8, let me know if that was what you were referring to! |
Ha, sorry - realize that wasn't clear =/ Hmm can you try looking up the lint code that corresponds to having 1 space and not 2, and then adding it to |
This should be the error code we're looking for: E305 | expected 2 blank lines after end of function or class Hopefully this works! (Looks like I have to remove the space as well) |
Ugh this is ridiculous - lint check still fails with the same issue:
Did you say that, when you ran black on your side, it did or didn't flag having only one space? If it did flag it, I wonder if it's possible to re-run it on your end to check that the error code that you added really suppresses this error? |
Just ran autoformat.sh; it keeps reformatting test_dialcrowd.py to the 2 spaces even with the E305 error suppressed. If I run autoformat.sh -f, which only runs flake8 (without only 1 space), the error doesn't seem to show up. |
Okay, yeah, I've reproduced the error with a new test PR at https://github.com/facebookresearch/ParlAI/pull/4519/files, and I see the same thing. I've tried various perturbations of this line to no avail, so I'm asking around to see if anyone else knows how to interpret that cryptic Black error |
Sounds good! |
Thanks for so much patience @jessicah25. Seems like we're only nits away. Let's just land this and I'll clean up black next week. |
(I will request a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving - thanks for the patience on this @jessicah25 !
Thank you for helping me through all of this @EricMichaelSmith @JackUrb ! |
@jessicah25 Hmm can you try doing a |
I just ran git merge main in my dialcrowd branch; is that what you meant? It's all up to date. |
Ah, can you try merging |
just did that! |
Can confirm that, as expected, the lint CI check is the only one that newly fails with this PR. Will merge now |
Hi @jessicah25 ! I was wondering - do you have any interest in maintaining Otherwise, one option is for us to eventually remove the code itself from the HEAD of What do you think? Either way would work for us :) CC @klshuster |
Hi Eric! I would be willing to upkeep it for a bit; would I reopen another pull request as I'm doing so? I can look at the necessary updates next week. |
Hi @jessicah25 ! Yes, you could open another pull request to make the changes. Thanks! |
Patch description
DialCrowd is a dialogue crowdsourcing toolkit that helps requesters write clear HITs, view and analyze results, and obtain higher-quality data. This integration allows for the requester interface, worker interface, and analysis interface to be integrated into ParlAI so requesters can have access to ParlAI's tools with DialCrowd's tools.
Testing steps
The testing steps can be found in the README; there are 3 components: configuration page, annotation page, quality page. The configuration script should allow a download of a config.json file, to be manually put into the task_config folder. The annotation page should use the config.json file to load a HIT that a worker can fully do. The quality page should use the information pulled from Mephisto for the annotation page to display the results and analysis.