Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Tests for the static turn annotations tasks #3254

Merged
merged 23 commits into from
Dec 2, 2020

Conversation

EricMichaelSmith
Copy link
Contributor

Patch description

  • Add end-to-end tests for the static turn annotation task, in the following cases: (1) without in-flight QA; (2) with in-flight QA; and (3) with in-flight QA and with a file of the specific indices of conversations to annotate
  • Streamline the testing framework by converting the helper-function mixin into a full-fledged abstract test class, subclassed by all end-to-end tests of parlai.crowdsourcing
  • Make a separate abstract test class, AbstractOneTurnCrowdsourcingTest, for all tests of Mephisto tasks in which all of the worker's input is submitted at once
  • Import fix for tests/crowdsourcing/tasks/test_chat_demo.py

Testing steps
python tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py

@EricMichaelSmith
Copy link
Contributor Author

@stephenroller @JackUrb @mwillwork @jxmsML It'd be great to have some eyes on this PR so I could merge it in and increase our test coverage :) Thanks!

mephisto_repo_folder = os.path.dirname(
os.path.dirname(os.path.abspath(mephisto.__file__))
)
sys.path.insert(1, mephisto_repo_folder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge red flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but not sure about a short-term workaround given @JackUrb 's comment below. Maybe I can just add a TODO here for the time being?

Copy link
Contributor Author

@EricMichaelSmith EricMichaelSmith Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've reverted this change in this PR, just because this issue doesn't need to be resolved right now. Without this change, this test will be silently skipped - I've fixed this problem in a better way and gotten this test to pass again in #3262

@@ -0,0 +1,190 @@
{"inputs": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like redundancy from the pytest regressions I'm adding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - but do you have a sense that the pytest regressions work that you're doing on ParlAI tasks could also be applied to Mephisto tasks, which have a much different structure? For instance, there isn't a concept of looping over examples of a dataset with a Mephisto task like there is with a ParlAI task

# Make agent act
self.server.send_agent_act(
agent_id,
{"MEPHISTO_is_submit": True, "task_data": expected_state['outputs']},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nit: would this test break if my ouputs have any time-related values such as timestamp? is the "outputs" where task_start task_end field logged there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, yes, if you have a timestamp in your output, this would break - lemme know if that's a common use case for you and we can find a workaround. The task_start and task_end fields get logged outside "outputs", in their own fields

receive annotations.
"""
overrides = [
f'+mephisto.blueprint.annotation_indices_jsonl={TASK_DIRECTORY}/task_config/annotation_indices_example.jsonl',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a place for doing sanity check on the indices jsonl and can raise index out of range error before the actual mephisto job is alive or otherwise it would output error at the front end?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that should happen in TurnAnnotationsStaticBlueprint.assert_task_args (causing the system to fail on invalid input before Mephisto creates anything at all), however this method has not been implemented yet in TurnAnnotationsStaticBlueprint. It will still fail before a task is launched to workers in __init__ though, as this method is called before launching tasks and it relies on annotation_indices_jsonl being valid.

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally Mephisto stuff seems alright, not sure what's really best for the import from our examples folder right now though. Eventually I'll want to move the ParlAI blueprint here, but not before we've finished work on getting bootstrap-chat to a good place.

tests/crowdsourcing/tasks/test_acute_eval.py Show resolved Hide resolved
@stephenroller
Copy link
Contributor

@JackUrb where is the mephisto code where this path insert is abused? maybe that can be improved with some importlib magic?

Nonetheless, it feels like a huge red flag to me that this is being done user side. Perhaps there should be a helper function in mephisto register_task_directory or something. The user can call that with the path. Under the hood, it can do the path insert. (Ideally though, it would insert only in the context it needed it though...)

@JackUrb
Copy link
Contributor

JackUrb commented Nov 19, 2020

@stephenroller It isn't ever used like this in Mephisto. Eric wants to import something from one of the demos, which isn't made available from within the main package at the moment (something like how I think projects works here?).

@stephenroller
Copy link
Contributor

Then the solution seems to me to deploy the demo as an importable package, not modify the sys path.

@JackUrb
Copy link
Contributor

JackUrb commented Nov 19, 2020

deploy the demo as an importable package

Would be happy to, but I'm still unclear how to do this if we don't want examples to generally be part of the package. Or are you suggesting we include them?

@EricMichaelSmith
Copy link
Contributor Author

deploy the demo as an importable package

Would be happy to, but I'm still unclear how to do this if we don't want examples to generally be part of the package. Or are you suggesting we include them?

Yeah, I think it might be good to if there's the expectation that downstream code (like ParlAI) might use them

@EricMichaelSmith
Copy link
Contributor Author

Hi @stephenroller @JackUrb @mwillwork @jxmsML can someone take a look at this PR again? This PR refactors the testing code for all of parlai.crowdsourcing, and so getting this code would remove messy conflicts from future parlai.crowdsourcing PRs :)

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'll add the mephisto examples module as we're prepping for our PyPI release in the coming days.

@EricMichaelSmith EricMichaelSmith merged commit e1474f1 into master Dec 2, 2020
@EricMichaelSmith EricMichaelSmith deleted the turn_annotations_static_testing branch December 2, 2020 13:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants