-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Tests for the static turn annotations tasks #3254
Conversation
@stephenroller @JackUrb @mwillwork @jxmsML It'd be great to have some eyes on this PR so I could merge it in and increase our test coverage :) Thanks! |
mephisto_repo_folder = os.path.dirname( | ||
os.path.dirname(os.path.abspath(mephisto.__file__)) | ||
) | ||
sys.path.insert(1, mephisto_repo_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge red flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but not sure about a short-term workaround given @JackUrb 's comment below. Maybe I can just add a TODO here for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've reverted this change in this PR, just because this issue doesn't need to be resolved right now. Without this change, this test will be silently skipped - I've fixed this problem in a better way and gotten this test to pass again in #3262
@@ -0,0 +1,190 @@ | |||
{"inputs": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like redundancy from the pytest regressions I'm adding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - but do you have a sense that the pytest regressions work that you're doing on ParlAI tasks could also be applied to Mephisto tasks, which have a much different structure? For instance, there isn't a concept of looping over examples of a dataset with a Mephisto task like there is with a ParlAI task
# Make agent act | ||
self.server.send_agent_act( | ||
agent_id, | ||
{"MEPHISTO_is_submit": True, "task_data": expected_state['outputs']}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nit: would this test break if my ouputs have any time-related values such as timestamp? is the "outputs" where task_start task_end field logged there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yes, if you have a timestamp in your output, this would break - lemme know if that's a common use case for you and we can find a workaround. The task_start and task_end fields get logged outside "outputs", in their own fields
receive annotations. | ||
""" | ||
overrides = [ | ||
f'+mephisto.blueprint.annotation_indices_jsonl={TASK_DIRECTORY}/task_config/annotation_indices_example.jsonl', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is there a place for doing sanity check on the indices jsonl and can raise index out of range error before the actual mephisto job is alive or otherwise it would output error at the front end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that should happen in TurnAnnotationsStaticBlueprint.assert_task_args
(causing the system to fail on invalid input before Mephisto creates anything at all), however this method has not been implemented yet in TurnAnnotationsStaticBlueprint
. It will still fail before a task is launched to workers in __init__
though, as this method is called before launching tasks and it relies on annotation_indices_jsonl being valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally Mephisto stuff seems alright, not sure what's really best for the import from our examples folder right now though. Eventually I'll want to move the ParlAI blueprint here, but not before we've finished work on getting bootstrap-chat
to a good place.
@JackUrb where is the mephisto code where this path insert is abused? maybe that can be improved with some Nonetheless, it feels like a huge red flag to me that this is being done user side. Perhaps there should be a helper function in mephisto |
@stephenroller It isn't ever used like this in Mephisto. Eric wants to import something from one of the demos, which isn't made available from within the main package at the moment (something like how I think |
Then the solution seems to me to deploy the demo as an importable package, not modify the sys path. |
Would be happy to, but I'm still unclear how to do this if we don't want examples to generally be part of the package. Or are you suggesting we include them? |
Yeah, I think it might be good to if there's the expectation that downstream code (like ParlAI) might use them |
Hi @stephenroller @JackUrb @mwillwork @jxmsML can someone take a look at this PR again? This PR refactors the testing code for all of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I'll add the mephisto examples module as we're prepping for our PyPI release in the coming days.
Patch description
parlai.crowdsourcing
AbstractOneTurnCrowdsourcingTest
, for all tests of Mephisto tasks in which all of the worker's input is submitted at oncetests/crowdsourcing/tasks/test_chat_demo.py
Testing steps
python tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static.py