Tests for the static turn annotations tasks #3254

EricMichaelSmith · 2020-11-09T22:58:49Z

Patch description

Add end-to-end tests for the static turn annotation task, in the following cases: (1) without in-flight QA; (2) with in-flight QA; and (3) with in-flight QA and with a file of the specific indices of conversations to annotate
Streamline the testing framework by converting the helper-function mixin into a full-fledged abstract test class, subclassed by all end-to-end tests of parlai.crowdsourcing
Make a separate abstract test class, AbstractOneTurnCrowdsourcingTest, for all tests of Mephisto tasks in which all of the worker's input is submitted at once
Import fix for tests/crowdsourcing/tasks/test_chat_demo.py

Testing steps
python tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py

EricMichaelSmith · 2020-11-17T17:58:34Z

@stephenroller @JackUrb @mwillwork @jxmsML It'd be great to have some eyes on this PR so I could merge it in and increase our test coverage :) Thanks!

stephenroller · 2020-11-17T18:06:39Z

tests/crowdsourcing/tasks/test_chat_demo.py

+    mephisto_repo_folder = os.path.dirname(
+        os.path.dirname(os.path.abspath(mephisto.__file__))
+    )
+    sys.path.insert(1, mephisto_repo_folder)


This is a huge red flag

Agreed, but not sure about a short-term workaround given @JackUrb 's comment below. Maybe I can just add a TODO here for the time being?

So, I've reverted this change in this PR, just because this issue doesn't need to be resolved right now. Without this change, this test will be silently skipped - I've fixed this problem in a better way and gotten this test to pass again in #3262

stephenroller · 2020-11-17T18:07:42Z

tests/crowdsourcing/tasks/turn_annotations_static/expected_states/in_flight_qa.json

@@ -0,0 +1,190 @@
+{"inputs": [


This feels like redundancy from the pytest regressions I'm adding

Hmm - but do you have a sense that the pytest regressions work that you're doing on ParlAI tasks could also be applied to Mephisto tasks, which have a much different structure? For instance, there isn't a concept of looping over examples of a dataset with a Mephisto task like there is with a ParlAI task

jxmsML · 2020-11-17T18:20:09Z

parlai/crowdsourcing/utils/tests.py

+        # Make agent act
+        self.server.send_agent_act(
+            agent_id,
+            {"MEPHISTO_is_submit": True, "task_data": expected_state['outputs']},


very nit: would this test break if my ouputs have any time-related values such as timestamp? is the "outputs" where task_start task_end field logged there?

Hmmm, yes, if you have a timestamp in your output, this would break - lemme know if that's a common use case for you and we can find a workaround. The task_start and task_end fields get logged outside "outputs", in their own fields

jxmsML · 2020-11-17T18:23:49Z

tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py

+            receive annotations.
+            """
+            overrides = [
+                f'+mephisto.blueprint.annotation_indices_jsonl={TASK_DIRECTORY}/task_config/annotation_indices_example.jsonl',


nit: Is there a place for doing sanity check on the indices jsonl and can raise index out of range error before the actual mephisto job is alive or otherwise it would output error at the front end?

This is something that should happen in TurnAnnotationsStaticBlueprint.assert_task_args (causing the system to fail on invalid input before Mephisto creates anything at all), however this method has not been implemented yet in TurnAnnotationsStaticBlueprint. It will still fail before a task is launched to workers in __init__ though, as this method is called before launching tasks and it relies on annotation_indices_jsonl being valid.

JackUrb

Generally Mephisto stuff seems alright, not sure what's really best for the import from our examples folder right now though. Eventually I'll want to move the ParlAI blueprint here, but not before we've finished work on getting bootstrap-chat to a good place.

tests/crowdsourcing/tasks/test_acute_eval.py

stephenroller · 2020-11-19T05:24:09Z

@JackUrb where is the mephisto code where this path insert is abused? maybe that can be improved with some importlib magic?

Nonetheless, it feels like a huge red flag to me that this is being done user side. Perhaps there should be a helper function in mephisto register_task_directory or something. The user can call that with the path. Under the hood, it can do the path insert. (Ideally though, it would insert only in the context it needed it though...)

JackUrb · 2020-11-19T15:04:27Z

@stephenroller It isn't ever used like this in Mephisto. Eric wants to import something from one of the demos, which isn't made available from within the main package at the moment (something like how I think projects works here?).

stephenroller · 2020-11-19T16:07:42Z

Then the solution seems to me to deploy the demo as an importable package, not modify the sys path.

JackUrb · 2020-11-19T16:12:55Z

deploy the demo as an importable package

Would be happy to, but I'm still unclear how to do this if we don't want examples to generally be part of the package. Or are you suggesting we include them?

EricMichaelSmith · 2020-11-19T18:09:37Z

deploy the demo as an importable package

Would be happy to, but I'm still unclear how to do this if we don't want examples to generally be part of the package. Or are you suggesting we include them?

Yeah, I think it might be good to if there's the expectation that downstream code (like ParlAI) might use them

EricMichaelSmith · 2020-12-01T13:58:31Z

Hi @stephenroller @JackUrb @mwillwork @jxmsML can someone take a look at this PR again? This PR refactors the testing code for all of parlai.crowdsourcing, and so getting this code would remove messy conflicts from future parlai.crowdsourcing PRs :)

JackUrb

LGTM - I'll add the mephisto examples module as we're prepping for our PyPI release in the coming days.

EricMichaelSmith added 18 commits November 6, 2020 18:56

Dump in what I have so far

69bc29d

Starting work on Meph tests

ff30337

Minor

aa7dfc7

Remove samples

e0d1315

Work on static turn annotations unit test

b467aed

Formatting file differently

c787729

Minor

3346dbf

Fixes

ba3a3de

Minor

09f6f1c

Don't check onboarding for now

510ec9d

Update convos

b822289

Fixes

d5dd75d

Don't have test be mixin

8a08589

Abstract away 1-turn tests

8c14ca8

Fixes

c5d83c0

More tests

1b2670a

Fixes to 3 tasks

b4efd7b

Make tests cleaner

c33f45f

EricMichaelSmith requested review from stephenroller, JackUrb and mwillwork November 9, 2020 22:58

facebook-github-bot added the CLA Signed label Nov 9, 2020

Remember to build the task

53d8c34

EricMichaelSmith requested a review from jxmsML November 12, 2020 14:00

stephenroller reviewed Nov 17, 2020

View reviewed changes

jxmsML reviewed Nov 17, 2020

View reviewed changes

JackUrb reviewed Nov 17, 2020

View reviewed changes

tests/crowdsourcing/tasks/test_acute_eval.py Show resolved Hide resolved

EricMichaelSmith added 3 commits November 20, 2020 17:44

Test reversion to test

ea6be0d

Revert config.yml

345d080

Merge branch 'master' into turn_annotations_static_testing

36fc211

EricMichaelSmith requested review from jxmsML, JackUrb and stephenroller November 20, 2020 23:15

Update import

7748efd

JackUrb approved these changes Dec 1, 2020

View reviewed changes

EricMichaelSmith merged commit e1474f1 into master Dec 2, 2020

EricMichaelSmith deleted the turn_annotations_static_testing branch December 2, 2020 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for the static turn annotations tasks #3254

Tests for the static turn annotations tasks #3254

EricMichaelSmith commented Nov 9, 2020

EricMichaelSmith commented Nov 17, 2020

stephenroller Nov 17, 2020

EricMichaelSmith Nov 17, 2020

EricMichaelSmith Nov 20, 2020 •

edited

Loading

stephenroller Nov 17, 2020

EricMichaelSmith Nov 20, 2020

jxmsML Nov 17, 2020

EricMichaelSmith Nov 20, 2020

jxmsML Nov 17, 2020

JackUrb Nov 17, 2020

JackUrb left a comment

stephenroller commented Nov 19, 2020

JackUrb commented Nov 19, 2020

stephenroller commented Nov 19, 2020

JackUrb commented Nov 19, 2020 •

edited

Loading

EricMichaelSmith commented Nov 19, 2020

EricMichaelSmith commented Dec 1, 2020

JackUrb left a comment

Tests for the static turn annotations tasks #3254

Tests for the static turn annotations tasks #3254

Conversation

EricMichaelSmith commented Nov 9, 2020

EricMichaelSmith commented Nov 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricMichaelSmith Nov 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackUrb left a comment

Choose a reason for hiding this comment

stephenroller commented Nov 19, 2020

JackUrb commented Nov 19, 2020

stephenroller commented Nov 19, 2020

JackUrb commented Nov 19, 2020 • edited Loading

EricMichaelSmith commented Nov 19, 2020

EricMichaelSmith commented Dec 1, 2020

JackUrb left a comment

Choose a reason for hiding this comment

EricMichaelSmith Nov 20, 2020 •

edited

Loading

JackUrb commented Nov 19, 2020 •

edited

Loading