Fix flaky model chat test #3442

EricMichaelSmith · 2021-02-09T15:13:39Z

Patch description
Fix one source of flakiness in the model-chat-analysis crowdsourcing CI check, from if a progress bar unexpectedly and irreproducibly prints an update in the stdout, which we want to check against the desired stdout. (See https://app.circleci.com/pipelines/github/facebookresearch/ParlAI/8467/workflows/5621488b-c27a-46e8-a76e-3993dee8dd59/jobs/68923 for an example of this.) This fix involves checking simply that each line of the desired stdout is present in the actual stdout, instead of requiring that the stdouts be exactly the same.

As part of the refactor of the CI check code, the analysis checks for the model-chat and static-turn-annotations tasks are brought into closer alignment, since the code for these similar checks were written in quite different ways.

Testing steps
The following checks were modified:

pytest tests/crowdsourcing/tasks/turn_annotations_static/test_turn_annotations_static_analysis.py
pytest tests/crowdsourcing/tasks/model_chat/test_model_chat_analysis.py

klshuster · 2021-02-09T15:59:22Z

parlai/crowdsourcing/utils/tests.py

+    actual_stdout_lines = actual_stdout.split('\n')
+    with open(expected_stdout_path) as f:
+        expected_stdout = f.read()
+    for expected_line in expected_stdout.split('\n'):


in your tests, have you found this op to take a while? it seems inefficient complexity-wise but I don't have a suggestion for anything better

Good point about the inefficiency - the desired output is only a few dozen lines, and in practice the quicker of the two tests that uses this takes roughly a quarter of a second total to complete

stephenroller · 2021-02-09T18:05:59Z

parlai/crowdsourcing/utils/tests.py

@@ -408,3 +408,26 @@ def _send_agent_message(
            "episode_done": False,
        }
        self.server.send_agent_act(agent_id=agent_id, act_content=act_content)
+
+
+def check_stdout(actual_stdout: str, expected_stdout_path: str):


I would strongly advise we use a pytest regression

Yes, normally I would do a pytest regression for this. However, this is a special case, because we want to check the stdout against some reference stdout, but the actual stdout will often but not always include strings representing progress indicators - for instance, see the "40%|#### | 4/10 [00:00<00:00, 15.52it/s]" string at https://app.circleci.com/pipelines/github/facebookresearch/ParlAI/8467/workflows/5621488b-c27a-46e8-a76e-3993dee8dd59/jobs/68923 . Thus, this PR checks the actual stdout against the reference stdout in a way that is tolerant to these extra strings, but as a result doesn't use pytest regressions

* Start refactoring test * Consolidate stdout-checking code * Fixes * Revert dependency * Fixes

EricMichaelSmith added 3 commits February 9, 2021 09:33

Start refactoring test

09518c6

Consolidate stdout-checking code

79fbe5d

Fixes

184842d

EricMichaelSmith requested review from JackUrb and klshuster February 9, 2021 15:13

facebook-github-bot added the CLA Signed label Feb 9, 2021

EricMichaelSmith added 2 commits February 9, 2021 10:14

Revert dependency

57c1481

Fixes

43d3713

klshuster approved these changes Feb 9, 2021

View reviewed changes

stephenroller reviewed Feb 9, 2021

View reviewed changes

EricMichaelSmith merged commit 9bb93c3 into master Feb 10, 2021

EricMichaelSmith deleted the flaky-model-chat-test branch February 10, 2021 13:37

stephenroller pushed a commit that referenced this pull request Feb 11, 2021

Fix flaky model chat test (#3442)

0d98aaf

* Start refactoring test * Consolidate stdout-checking code * Fixes * Revert dependency * Fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky model chat test #3442

Fix flaky model chat test #3442

EricMichaelSmith commented Feb 9, 2021

klshuster Feb 9, 2021

EricMichaelSmith Feb 9, 2021

stephenroller Feb 9, 2021

EricMichaelSmith Feb 9, 2021

Fix flaky model chat test #3442

Fix flaky model chat test #3442

Conversation

EricMichaelSmith commented Feb 9, 2021

klshuster Feb 9, 2021

Choose a reason for hiding this comment

EricMichaelSmith Feb 9, 2021

Choose a reason for hiding this comment

stephenroller Feb 9, 2021

Choose a reason for hiding this comment

EricMichaelSmith Feb 9, 2021

Choose a reason for hiding this comment