Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Fix flaky model chat test #3442

Merged
merged 5 commits into from
Feb 10, 2021
Merged

Fix flaky model chat test #3442

merged 5 commits into from
Feb 10, 2021

Conversation

EricMichaelSmith
Copy link
Contributor

Patch description
Fix one source of flakiness in the model-chat-analysis crowdsourcing CI check, from if a progress bar unexpectedly and irreproducibly prints an update in the stdout, which we want to check against the desired stdout. (See https://app.circleci.com/pipelines/github/facebookresearch/ParlAI/8467/workflows/5621488b-c27a-46e8-a76e-3993dee8dd59/jobs/68923 for an example of this.) This fix involves checking simply that each line of the desired stdout is present in the actual stdout, instead of requiring that the stdouts be exactly the same.

As part of the refactor of the CI check code, the analysis checks for the model-chat and static-turn-annotations tasks are brought into closer alignment, since the code for these similar checks were written in quite different ways.

Testing steps
The following checks were modified:

  • pytest tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static_analysis.py
  • pytest tests/crowdsourcing/tasks/model_chat/test_model_chat_analysis.py

actual_stdout_lines = actual_stdout.split('\n')
with open(expected_stdout_path) as f:
expected_stdout = f.read()
for expected_line in expected_stdout.split('\n'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your tests, have you found this op to take a while? it seems inefficient complexity-wise but I don't have a suggestion for anything better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the inefficiency - the desired output is only a few dozen lines, and in practice the quicker of the two tests that uses this takes roughly a quarter of a second total to complete

@@ -408,3 +408,26 @@ def _send_agent_message(
"episode_done": False,
}
self.server.send_agent_act(agent_id=agent_id, act_content=act_content)


def check_stdout(actual_stdout: str, expected_stdout_path: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly advise we use a pytest regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, normally I would do a pytest regression for this. However, this is a special case, because we want to check the stdout against some reference stdout, but the actual stdout will often but not always include strings representing progress indicators - for instance, see the "40%|#### | 4/10 [00:00<00:00, 15.52it/s]" string at https://app.circleci.com/pipelines/github/facebookresearch/ParlAI/8467/workflows/5621488b-c27a-46e8-a76e-3993dee8dd59/jobs/68923 . Thus, this PR checks the actual stdout against the reference stdout in a way that is tolerant to these extra strings, but as a result doesn't use pytest regressions

@EricMichaelSmith EricMichaelSmith merged commit 9bb93c3 into master Feb 10, 2021
@EricMichaelSmith EricMichaelSmith deleted the flaky-model-chat-test branch February 10, 2021 13:37
stephenroller pushed a commit that referenced this pull request Feb 11, 2021
* Start refactoring test

* Consolidate stdout-checking code

* Fixes

* Revert dependency

* Fixes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants