Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Dialcrowd #4387

Merged
merged 16 commits into from
May 2, 2022
Merged

Dialcrowd #4387

merged 16 commits into from
May 2, 2022

Conversation

jessicah25
Copy link
Contributor

Patch description
DialCrowd is a dialogue crowdsourcing toolkit that helps requesters write clear HITs, view and analyze results, and obtain higher-quality data. This integration allows for the requester interface, worker interface, and analysis interface to be integrated into ParlAI so requesters can have access to ParlAI's tools with DialCrowd's tools.

Testing steps
The testing steps can be found in the README; there are 3 components: configuration page, annotation page, quality page. The configuration script should allow a download of a config.json file, to be manually put into the task_config folder. The annotation page should use the config.json file to load a HIT that a worker can fully do. The quality page should use the information pulled from Mephisto for the annotation page to display the results and analysis.

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jessicah25, thanks for putting this forward! It's exciting to see what seems like a significant quality of life and ease of use improvement for dialogue tasks on Mephisto. At the moment though this PR is pretty monolithic and hard to parse. Particularly I'm not sure I follow what the use of the webapp-config and webapp-results are specifically, and can't tell if they're attempting to integrate with mephisto's existing tooling for building out custom review flows or not (if not, I'm happy to help integrate!). It would be great to see some screenshots that break down your usage a little more clearly.

There are also a number of classes that would need to be renamed (primarily contents in your parlai/crowdsourcing/tasks/dialcrowd/dialcrowd_blueprint.py file) as they're overwriting the namespace of classes we use elsewhere.

I can give a more detailed review on the whole PR after I get some more context back here.

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to take a pass through the code, but I'm having difficulty disambiguating between required content and old stuff that exists because you were building off of some of our directories as a template. There are some interesting things here in DialCrowd that I really feel like ParlAI users would benefit by having, but it would be hard to maintain this codebase in the current state. I've left some early comments, but I'm going to hold off on a complete pass until things are cleaned up a bit.

parlai/crowdsourcing/tasks/dialcrowd/run.py Outdated Show resolved Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/run_in_flight_qa.py Outdated Show resolved Hide resolved
"nUnit": "5",
"nUnitDuplicated": 1,
"nUnitGolden": 1,
"questionCategories": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then these are examples for workers to understand the different categories?

@@ -0,0 +1,18 @@
const express = require('express');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This server doesn't have any method to post a completed config. How does the end-user save their config to use with their task? I suppose you're using saveAs to put to someone's local machine, but what if you're routing through ssh + forwarded ports and the local machine is not the one you intend to run a task from?

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, some addressing @JackUrb's questions. But yeah, agreed with Jack, happy to take a deeper pass of this once we get a sense of whether there's a way to avoid this level of duplication of existing code from turn_annotations_static/ :)

parlai/crowdsourcing/tasks/dialcrowd/run.py Outdated Show resolved Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/run.py Outdated Show resolved Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/run_in_flight_qa.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Apr 3, 2022

This PR has not had activity in 30 days. Closing due to staleness.

@github-actions github-actions bot added the stale label Apr 3, 2022
@github-actions github-actions bot closed this Apr 10, 2022
@jessicah25
Copy link
Contributor Author

Hi! Sorry for the delay; but I've cleaned up the code so it hopefully is more clear, and removed a lot of the unneeded materials. Not sure how to reopen this since it got closed.

@JackUrb JackUrb reopened this Apr 18, 2022
@JackUrb
Copy link
Contributor

JackUrb commented Apr 18, 2022

The cleanup changes here are hugely appreciated from my perspective, thanks for doing so @jessicah25! I still have a few unresolved questions above (mostly just to better understand the functionality), but overall one last thing that would be greatly appreciated would be including the screenshots from walking through an example into your readme - I find this helps set expectations and (short of having automated testing) makes it easier to know when something isn't quite working the way it was intended.

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, much cleaner, thanks so much for this :) A few more comments, but I think this is a massive step forward already

@JackUrb historically we've always had unit test coverage on everything in parlai/crowdsourcing/tasks/, but I think you said that maybe this isn't as crucial for now because a better solution for Mephisto E2E task testing its on its way? (Or do you think that it's still worth getting tests on this now?)

parlai/crowdsourcing/tasks/dialcrowd/README.md Outdated Show resolved Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/run.py Outdated Show resolved Hide resolved
parlai/crowdsourcing/tasks/dialcrowd/util.py Outdated Show resolved Hide resolved
@JackUrb
Copy link
Contributor

JackUrb commented Apr 18, 2022

a better solution for Mephisto E2E task testing its on its way

Definitely! Though we'd be hard pressed to know exactly what the baseline expected inputs and outputs are without a provided sample. Would definitely appreciate a minimal runtime inputs+outputs included in some /tests/ folder, such that we can extend to automated testing in the future.

@github-actions github-actions bot removed the stale label Apr 19, 2022
@jessicah25
Copy link
Contributor Author

We were planning to link to our paper, which will be published soon, which has more information about DialCrowd. I can definitely add more information into the README. To address the questions you had above @JackUrb , DialCrowd is split into 3 components which are run separately: the configuration page, the MTurk worker page, and the results page. When we save the configuration file from the configuration page, it shouldn't be an issue of where the file is saved to since it should be on whatever machine the user is accessing ParlAI from. The configuration page and the results page aren't seen by the workers and are only served locally.

@jessicah25
Copy link
Contributor Author

What kinds of tests would you want to include?

@EricMichaelSmith
Copy link
Contributor

What kinds of tests would you want to include?

Hmm take a look at the tests in tests/crowdsourcing/tasks/, for instance tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py - this is what we typically do. It's a test for the backend code to make sure that the right agent state is produced given sample dummy inputs (as mentioned by @JackUrb )

@jessicah25
Copy link
Contributor Author

I've made the other edits, but I'm working on the tests now, and I had a few questions. Would this test be applicable to the two sections of DialCrowd which are not worker-facing?

@jessicah25
Copy link
Contributor Author

Just committed my changes! Let me know if I interpreted the tests correctly.

@EricMichaelSmith
Copy link
Contributor

EricMichaelSmith commented Apr 22, 2022

Cool, thanks a lot @jessicah25 for adding lots of user-friendly task description in the README. The test looks great! Yeah, this was what I was thinking of for a test, although maybe @JackUrb has other ideas as well.

Hmm it looks like a bunch of CI checks are broken, but it's unclear which of them are because of the changes in this PR, since this fork was split off from the facebookresearch:main branch a few months ago. Would you be able to merge from or rebase onto the HEAD of facebookresearch:main in order to see if the CI checks pass given the latest code in that branch?

@jessicah25
Copy link
Contributor Author

Yes, let me try to do that now!

@EricMichaelSmith
Copy link
Contributor

Yes, let me try to do that now!

Okay great! Hmm, looks like the unittests_38 CI check is failing now with the following error:

E           AssertionError: '__init__.py' not found in ['annotation1.png', 'annotation2.png', 'config1.png', 'config2.png', 'config3.png', 'config4.png', 'config5.png', 'config6.png', 'results1.png', 'results2.png', 'results3.png'] : parlai/crowdsourcing/tasks/dialcrowd/images does not contain __init__.py

However, this should be an easy fix (there just needs to be a __init__.py file added to that folder)

@jessicah25
Copy link
Contributor Author

It seems like there are other errors, but I'm not sure what is causing them; would you be able to give some insight into it?

@EricMichaelSmith
Copy link
Contributor

@jessicah25 Okay, looking at the errors one-by-one:

  • pre-commit, lint: can you try doing ./autoformat.sh in the base ParlAI folder? That will hopefully fix the linting issues that seem to be breaking these checks. From https://github.com/facebookresearch/ParlAI/runs/6135168591?check_suite_focus=true it's also complaining about some unused import statements, which should be able to be removed
  • crowdsourcing_tests: @JackUrb do you know what might be causing this issue on the Dialcrowd test? I don't think I've seen this specific issue before (maybe the assertion is new to Mephisto 1.0?)
  • long_gpu_tests, teacher_tests, unittests_gpu18: these look like issues that exist in main as well currently, so they're not related to this PR

@jessicah25
Copy link
Contributor Author

Looks like the remaining errors are the ones still existing on main (long_gpu_tests, teacher_tests, unittests_gpu18), and crowdsourcing_tests!

@EricMichaelSmith
Copy link
Contributor

Looks like the remaining errors are the ones still existing on main (long_gpu_tests, teacher_tests, unittests_gpu18), and crowdsourcing_tests!

Hmm it looks like most of the linting issues are resolved, but there's still one left (apparently I have to approve in order to run the linting CI check): the error message says it's on line 76 of tests/crowdsourcing/tasks/dialcrowd/test_dialcrowd.py. Can you do pip install black and run it on that script to try to fix that error? Black is sometimes very picky about minor linting issues

@jessicah25
Copy link
Contributor Author

jessicah25 commented Apr 25, 2022

Hm, I had already run pip install black and fixed one error in test_dialcrowd, but I'm not seeing an error when I run ./autoformat.sh right now. No files are changing (186 were left unchanged).

@EricMichaelSmith
Copy link
Contributor

EricMichaelSmith commented Apr 25, 2022

Hm, I had already run pip install black and fixed one error in test_dialcrowd, but I'm not seeing an error when I run ./autoformat.sh right now. No files are changing (186 were left unchanged).

Hmm - what if you add a second empty line above except ImportError: in that script? I notice that there are two above that line in tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py. That's the kind of thing that the linter would complain about

@JackUrb
Copy link
Contributor

JackUrb commented Apr 25, 2022

The crowdsourcing test failure is strange - it's failing to create the agents. Perhaps you need to manually import your specific blueprint type and ensure that the config you're feeding to the test includes this as well?

Also does this test pass locally? I'd expect you may find clearer debug output in your own console if not.

@jessicah25
Copy link
Contributor Author

Just did that!

@EricMichaelSmith
Copy link
Contributor

Just did that!

Great, thanks! Will ask about this now

@EricMichaelSmith
Copy link
Contributor

Just did that!

Great, thanks! Will ask about this now

Okay cool, asked the others - will see what they say

@EricMichaelSmith
Copy link
Contributor

Just did that!

Great, thanks! Will ask about this now

Okay cool, asked the others - will see what they say

Just heard back - apparently black (i.e. the pre-commit CI check) should be the source of truth on this. Can you try adding an exception to .flake8 to suppress the lint error for this here?

@jessicah25
Copy link
Contributor Author

I tried adding test_dialcrowd.py to the exclude files of .flake8, let me know if that was what you were referring to!

@EricMichaelSmith
Copy link
Contributor

I tried adding test_dialcrowd.py to the exclude files of .flake8, let me know if that was what you were referring to!

Ha, sorry - realize that wasn't clear =/ Hmm can you try looking up the lint code that corresponds to having 1 space and not 2, and then adding it to extend-ignore in that file? I think that should be a robust fix, because it'll prevent this issue from recurring

@jessicah25
Copy link
Contributor Author

jessicah25 commented Apr 28, 2022

This should be the error code we're looking for:

E305 | expected 2 blank lines after end of function or class

Hopefully this works! (Looks like I have to remove the space as well)

@EricMichaelSmith
Copy link
Contributor

This should be the error code we're looking for:

E305 | expected 2 blank lines after end of function or class

Hopefully this works! (Looks like I have to remove the space as well)

Ugh this is ridiculous - lint check still fails with the same issue:

[
  {
    path: 'tests/crowdsourcing/tasks/dialcrowd/test_dialcrowd.py',
    start_line: 76,
    end_line: 76,
    start_column: 1,
    end_column: 1,
    annotation_level: 'failure',
    message: '[BLK100] Black would make changes.'
  }
]

Did you say that, when you ran black on your side, it did or didn't flag having only one space? If it did flag it, I wonder if it's possible to re-run it on your end to check that the error code that you added really suppresses this error?

@jessicah25
Copy link
Contributor Author

jessicah25 commented Apr 28, 2022

Just ran autoformat.sh; it keeps reformatting test_dialcrowd.py to the 2 spaces even with the E305 error suppressed. If I run autoformat.sh -f, which only runs flake8 (without only 1 space), the error doesn't seem to show up.

@EricMichaelSmith
Copy link
Contributor

EricMichaelSmith commented Apr 28, 2022

Just ran autoformat.sh; it keeps reformatting test_dialcrowd.py to the 2 spaces even with the E305 error suppressed. If I run autoformat.sh -f, which only runs flake8 (without only 1 space), the error doesn't seem to show up.

Okay, yeah, I've reproduced the error with a new test PR at https://github.com/facebookresearch/ParlAI/pull/4519/files, and I see the same thing. I've tried various perturbations of this line to no avail, so I'm asking around to see if anyone else knows how to interpret that cryptic Black error

@jessicah25
Copy link
Contributor Author

Sounds good!

@stephenroller
Copy link
Contributor

Thanks for so much patience @jessicah25. Seems like we're only nits away. Let's just land this and I'll clean up black next week.

@stephenroller
Copy link
Contributor

(I will request a git merge main to see if it resolves all the broken tests)

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - thanks for the patience on this @jessicah25 !

@jessicah25
Copy link
Contributor Author

Thank you for helping me through all of this @EricMichaelSmith @JackUrb !

@EricMichaelSmith
Copy link
Contributor

(I will request a git merge main to see if it resolves all the broken tests)

@jessicah25 Hmm can you try doing a git merge main one last time to see how many tests are resolved before we merge this in? :)

@jessicah25
Copy link
Contributor Author

I just ran git merge main in my dialcrowd branch; is that what you meant? It's all up to date.

@EricMichaelSmith
Copy link
Contributor

I just ran git merge main in my dialcrowd branch; is that what you meant? It's all up to date.

Ah, can you try merging facebookresearch:main into your fork once last time, as you did a few days ago? I'm not sure that I can do it myself on your fork - for instance, I tried following this guide but I can't see "Fetch upstream", implying that I may not have write access to it

@jessicah25
Copy link
Contributor Author

just did that!

@EricMichaelSmith
Copy link
Contributor

Can confirm that, as expected, the lint CI check is the only one that newly fails with this PR. Will merge now

@EricMichaelSmith EricMichaelSmith merged commit 98143d6 into facebookresearch:main May 2, 2022
@EricMichaelSmith
Copy link
Contributor

Hi @jessicah25 ! I was wondering - do you have any interest in maintaining dialcrowd and keeping it up to date? @JackUrb suggested in #4677 that dialcrowd may not work correctly currently because the frontend uses a pre-v1.0 version of Mephisto.

Otherwise, one option is for us to eventually remove the code itself from the HEAD of main, but leave the dialcrowd folder present, with a README giving instructions to the user for how to use dialcrowd by rewinding main to a known-working commit.

What do you think? Either way would work for us :)

CC @klshuster

@jessicah25
Copy link
Contributor Author

Hi Eric! I would be willing to upkeep it for a bit; would I reopen another pull request as I'm doing so? I can look at the necessary updates next week.

@EricMichaelSmith
Copy link
Contributor

EricMichaelSmith commented Aug 18, 2022

Hi @jessicah25 ! Yes, you could open another pull request to make the changes. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants