-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
parlai/crowdsourcing/utils/tests.py
Outdated
max_num_tries = 6 | ||
mock_worker_registration_name = f"MOCK_WORKER_{idx:d}" | ||
mock_worker_name = f"{mock_worker_registration_name}_sandbox" | ||
max_num_tries = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, retries should no longer be necessary with assert_sandbox_worker_created
and await_channel_requests
which run the async loop until pending things are processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed this retry loop without breaking the tests
@JackUrb Most of the crowdsourcing tests don't seem to be running currently due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR and for doing these refactors - yeah, it looks like it's useful to have a few pieces of boilerplate code abstracted away, and not needing to do retries would be useful. Am trying to get all 43 crowdsourcing tests to run to get a better sense of what needs to be done with this
Okay, I got all 43 crowdsourcing checks to run so that we can debug them. @JackUrb 27 of them, the Fast-Acute ones, are currently failing with a "No live runs present" error due to no |
Hadn't noticed that the blueprint was broken under-the-hood, leading to a blueprint launch error (and thus no running tasks), can make a quick change for this. (running locally identified the issue in logs) |
Great, thanks! Hmm, now I'm seeing an |
That would imply to me that the unit was still running when it was shutdown, and thus the shutdown waited for the timeout. You may need to examine this locally to see what was coming through the agent and what wasn't, as I'm unclear why this happened from the given info. |
Hmm, on my devfair, it looks like this issue with the unit being left hanging came from a
So I suppose the question now is (1) whether the unit got saved in the data browser correctly, and if so, (2) why it's not being loaded back in |
My bet is still not have been marked as completed, which would happen in another thread (and I imagine if this script is launched before the unit is completed, you won't get result data). I expect this to more likely be (1) than (2). You'd likely want to dig into the Actually this is likely it, we've changed the semantics for live
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All crowdsourcing tests seem to be passing now. No remaining issues that I can see
Patch description
First set of steps that get
crowdsourcing
tests running (no longer breaking the newer mephisto conventions), but still not passing. More work is to be done there, leaving this open as a starting point for others to make comments and take over.(@EricMichaelSmith : all crowdsourcing tests are passing now as of March 30th)
Testing steps