Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA Data Collection Task #268

Closed
vaibhavad opened this issue Sep 26, 2020 · 15 comments
Closed

QA Data Collection Task #268

vaibhavad opened this issue Sep 26, 2020 · 15 comments

Comments

@vaibhavad
Copy link
Contributor

Hi,

I want to port QA data collection task from ParlAI to Mephisto.

Here a single MTurk will question-answers pairs relevant to the paragraph. Will this require different Mephisto blueprints?
In ParlAI there were tasks for data loading like squad and Wikipedia. Any guidelines on how to include the same functionality in Mephisto?

@JackUrb
Copy link
Contributor

JackUrb commented Sep 26, 2020

Hi @vaibhavad - you should certainly be able to do this using the existing blueprint. The example's README has some details about what this looks like.

In short, you'll be copying over the old worlds.py file, and adding the required meta functions that Mephisto uses to build the world (Including the step of loading the QA agent from shared in make_world). Then you'll make a run file and config file that set the relevant parameters (specifying the task description, loading the model, passing the shared params to world_opt to be used in make_world, etc).

All of this should likely live in a folder in ParlAI/parlai/crowdsourcing. In there should be a worlds file, a run file, and a configuration .yaml file in a conf folder.

@vaibhavad
Copy link
Contributor Author

vaibhavad commented Sep 26, 2020

Hi @JackUrb - Thanks a lot for the quick reply!

Following your advice, I have set up the example in my fork. I copied everything from parlai_chat_task_demo example and changed conf/ and demo_worlds.py according to the QA data collection task.

For some reason the first message from QACollector (The agent which provides paragraph and prompts to the turker) is displayed twice. I have verified that the parley function of the world is called only once. Any idea from where this error might be originating?

@JackUrb
Copy link
Contributor

JackUrb commented Sep 26, 2020

Ah I thought we had worked this bug out already! I'll address this Monday but it's just a minor display bug on our end. If you refresh you'll likely see just one message.

@vaibhavad
Copy link
Contributor Author

Thanks a lot!

@vaibhavad
Copy link
Contributor Author

vaibhavad commented Sep 28, 2020

Strangely the issue went away on its own! :D

I have implemented QA Collection Task (ignore the README.md) by defining a new dataloader configuration and using it as an agent in worlds.py file. I have defined configurations for Wikipedia and SQuAD.

Let me know if you want to include this as one of the examples. I'll then document the code and create a README.

@JackUrb
Copy link
Contributor

JackUrb commented Sep 28, 2020

The double-message issue is intermittent, though I haven't been able to track down why it happens sometimes (even though I have a good idea on how to prevent it).

But besides that, this implementation is really mostly good to go! There's some code that I can offer some suggestions for improving on, though I think the most appropriate place to submit this is as part of the ParlAI repo (as it will eventually replace the version of the task currently there). If you open a PR adding this as a new task to parlai/crowdsourcing/tasks in the ParlAI repo I'd be happy to leave comments (and note some things that can be trimmed) over there!

@JackUrb
Copy link
Contributor

JackUrb commented Nov 10, 2020

HI @vaibhavad - are you still interested in merging this in?

@vaibhavad
Copy link
Contributor Author

Hi @JackUrb,

Thank you for all your help. I'll open a PR on parlai/crowdsourcing/tasks sometime later, as suggested by you. Has the double-message issue been fixed? If not, can you give an outline of how to prevent it?

@JackUrb
Copy link
Contributor

JackUrb commented Nov 10, 2020

Indeed - it was a regression, currently cleaning up a fix on the resolve-double-message branch. The fix will prevent the double send from the Mephisto backend side, so you shouldn't need to change anything.

@vaibhavad
Copy link
Contributor Author

Thanks a lot, I'll close this!

@EricMichaelSmith
Copy link
Contributor

Hi @vaibhavad - do you still have your code for this around somewhere? We're working to finish porting this task over to Mephisto now, and it'd be super useful to have your contribution :)

@vaibhavad
Copy link
Contributor Author

vaibhavad commented Dec 18, 2020

Hi @EricMichaelSmith,

Yes, I'll be very happy to contribute. Will this go in parlai/crowdsourcing/tasks or mephisto/examples? Currently, I have opened a pull request (#344 ) for Mephisto. Let me know if I should open a PR in parlai instead.

@EricMichaelSmith
Copy link
Contributor

Hi @EricMichaelSmith,

Yes, I'll be very happy to contribute. Will this go in parlai/crowdsourcing/tasks or mephisto/examples? Currently, I have opened a pull request (#344 ) for Mephisto. Let me know if I should open a PR in parlai instead.

Hi @vaibhavad - that's great to hear. I think it'd make more sense for it to go in parlai/crowdsourcing/tasks, yeah - we're trying to consolidate all of our ParlAI-related tasks there. If you open your PR in that repo, I'd be happy to review it!

@vaibhavad
Copy link
Contributor Author

Opened a PR here

@EricMichaelSmith
Copy link
Contributor

Opened a PR here

Great, thanks! Will take a look there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants