add --repeat-until-failure to mix test #13398

SteffenDE · 2024-03-06T21:49:49Z

The world isn't perfect and so are our tests. When debugging flaky tests I found myself wanting to run mix test repeatedly until a test fails. Of course you can do a loop in bash, fish, etc., but a lot of time is spent on starting the BEAM and the application.

This is where this PR comes in. It proposes a new flag --repeat-until-failure that re-runs all tests until at least one fails.

josevalim · 2024-03-06T22:09:02Z

Shall we make it an integer, so we can put a maximum amount of repeats?

SteffenDE · 2024-03-07T15:23:10Z

Changed it to be an integer. Works fine, but I tried writing some unit tests and that did not go well...

josevalim · 2024-03-07T17:20:44Z

Thank you!

To be honest, I am not very happy that we need to copy of all test cases in the runner now. You can use ExUnit to run tests programmatically and that would become a memory leak.

I can think of two other options:

Re-run the whole module, if it succeeds, instead of the whole suite
Instead of having a CLI flag, have a @tag which you can annotate tests: @tag repeat_until_failure: 10 and read this tag in the runner

Thoughts?

SteffenDE · 2024-03-07T22:51:52Z

I don't think 1 is a good solution, because non-flaky modules could prevent others from running?

Concerning 2: maybe such a repeat tag would be nice as a complement to this in general?

I adapted the code to only store the modules when --repeat-until-failure is used. Wdyt?

josevalim · 2024-03-08T07:09:28Z

lib/ex_unit/lib/ex_unit/server.ex

+  end
+
+  defp maybe_add_persistent_sync_modules(state, name) do
+    if Application.fetch_env!(:ex_unit, :repeat_until_failure) > 0 do


Sorry but I am worried about depending on global state here. For example, ExUnit.configure can be called at any time and change the meaning of this. Ideally we will keep most state management in the runner. I can think of two other options:

Pass a flag when we call take_modules in the runner to also make a copy of the modules we are taking

Have the runner collect all modules as it runs them and explicitly push them again on restore_modules

I think 2 is the cleanest and requires only a small change to async_loop. There is no need of additional state and we most rely on existing server APIs. Thoughts?

Thank you for all the feedback, I appreciate that very much. I changed the code to use your suggestion 2.

lib/ex_unit/test/ex_unit_test.exs

lib/ex_unit/lib/ex_unit/runner.ex

SteffenDE · 2024-03-08T12:07:46Z

Maybe it would be nice to use a new seed for every repeat if it was not explicitly configured with --seed?

josevalim · 2024-03-08T12:23:13Z

Maybe it would be nice to use a new seed for every repeat if it was not explicitly configured with --seed?

Sounds like a solid idea to me. :) Maybe we move the loop a bit higher in the stack and run the whole configuration again?

viniciusmuller · 2024-03-08T13:09:00Z

This feature will be very useful, thanks for implementing it @SteffenDE 😃

SteffenDE · 2024-03-08T13:19:18Z

Maybe we move the loop a bit higher in the stack and run the whole configuration again?

Sounds good. I wonder how we should handle the seed, because we store the generated seed in the env and then we cannot easily say if the seed was explicitly set or generated:

elixir/lib/ex_unit/lib/ex_unit.ex

Lines 484 to 487 in 713bac0

    
           defp persist_defaults(config) do 
        
             config |> Keyword.take([:max_cases, :seed, :trace]) |> configure() 
        
             config 
        
           end

What would happen if we don't persist the seed? (the tests still pass if I exclude the seed)

SteffenDE · 2024-03-08T13:29:50Z

Hmm yep. Not persisting the seed is not compatible with #12442...

SteffenDE · 2024-03-08T13:36:07Z

lib/ex_unit/lib/ex_unit.ex

+      # we want to change the seed when using with `repeat_until_failure`
+      Application.put_env(:ex_unit, :seed_generated, true)


not pretty, but not sure what would be better

SteffenDE · 2024-03-08T13:38:36Z

So, one important difference with the seed when using --repeat-until-failure is that modules are not recompiled with the order of the new seed, because they are already compiled on the first run. The changed seed therefore only affects the order of the test runs themselves.

I think that's reasonable.

josevalim · 2024-03-08T14:31:22Z

I will go ahead and merge it and explore some ideas. :) Thank you!

josevalim · 2024-03-08T14:31:52Z

💚 💙 💜 💛 ❤️

saveman71 · 2024-03-13T11:33:42Z

Landing on this PR as I am debugging a flaky test at the moment 🥲

20mn passes as I upgrade the codebase to 1.17 to test that new shiny tool...

20mn to debug the test, which ended up being an issue with multiple DateTime.now(), which was caught by this, (needed more than 100 runs though, but it's nice that you can focus a single test)

And done, thanks! That helped a lot! One thing that could help is to be able to, say, run tests with a concurrency of 10 or 100, as my issue would have been more prominent the slower things were, or to have another way of "slowing things down".

add --repeat-until-failure to mix test

8e9065d

--repeat-until-failure integer

3b5b87f

only store modules when --repeat-until-failure is used

b3e8c4b

SteffenDE force-pushed the exunit_repeat_until_failure branch from 5e83893 to b3e8c4b Compare March 7, 2024 22:52

josevalim reviewed Mar 8, 2024

View reviewed changes

handle module restore in ExUnit.Runner

4ce8634

SteffenDE commented Mar 8, 2024

View reviewed changes

lib/ex_unit/test/ex_unit_test.exs Outdated Show resolved Hide resolved

SteffenDE commented Mar 8, 2024

View reviewed changes

lib/ex_unit/lib/ex_unit/runner.ex Outdated Show resolved Hide resolved

don't use application env, don't use persistent_term

947a58f

loop outside the runner, re-configure seed

445f5ff

SteffenDE commented Mar 8, 2024

View reviewed changes

simplify repeated_run

c5e1ea7

josevalim marked this pull request as ready for review March 8, 2024 14:31

josevalim merged commit d75930b into elixir-lang:main Mar 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --repeat-until-failure to mix test #13398

add --repeat-until-failure to mix test #13398

SteffenDE commented Mar 6, 2024

josevalim commented Mar 6, 2024

SteffenDE commented Mar 7, 2024

josevalim commented Mar 7, 2024

SteffenDE commented Mar 7, 2024

josevalim Mar 8, 2024

SteffenDE Mar 8, 2024

SteffenDE commented Mar 8, 2024

josevalim commented Mar 8, 2024

viniciusmuller commented Mar 8, 2024

SteffenDE commented Mar 8, 2024

SteffenDE commented Mar 8, 2024

SteffenDE Mar 8, 2024

SteffenDE commented Mar 8, 2024

josevalim commented Mar 8, 2024

josevalim commented Mar 8, 2024

saveman71 commented Mar 13, 2024

		# we want to change the seed when using with `repeat_until_failure`
		Application.put_env(:ex_unit, :seed_generated, true)

add --repeat-until-failure to mix test #13398

add --repeat-until-failure to mix test #13398

Conversation

SteffenDE commented Mar 6, 2024

josevalim commented Mar 6, 2024

SteffenDE commented Mar 7, 2024

josevalim commented Mar 7, 2024

SteffenDE commented Mar 7, 2024

josevalim Mar 8, 2024

Choose a reason for hiding this comment

SteffenDE Mar 8, 2024

Choose a reason for hiding this comment

SteffenDE commented Mar 8, 2024

josevalim commented Mar 8, 2024

viniciusmuller commented Mar 8, 2024

SteffenDE commented Mar 8, 2024

SteffenDE commented Mar 8, 2024

SteffenDE Mar 8, 2024

Choose a reason for hiding this comment

SteffenDE commented Mar 8, 2024

josevalim commented Mar 8, 2024

josevalim commented Mar 8, 2024

saveman71 commented Mar 13, 2024