Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --repeat-until-failure to mix test #13398

Merged
merged 7 commits into from
Mar 8, 2024

Conversation

SteffenDE
Copy link
Contributor

The world isn't perfect and so are our tests. When debugging flaky tests I found myself wanting to run mix test repeatedly until a test fails. Of course you can do a loop in bash, fish, etc., but a lot of time is spent on starting the BEAM and the application.

This is where this PR comes in. It proposes a new flag --repeat-until-failure that re-runs all tests until at least one fails.

@josevalim
Copy link
Member

Shall we make it an integer, so we can put a maximum amount of repeats?

@SteffenDE
Copy link
Contributor Author

Changed it to be an integer. Works fine, but I tried writing some unit tests and that did not go well...

@josevalim
Copy link
Member

Thank you!

To be honest, I am not very happy that we need to copy of all test cases in the runner now. You can use ExUnit to run tests programmatically and that would become a memory leak.

I can think of two other options:

  1. Re-run the whole module, if it succeeds, instead of the whole suite

  2. Instead of having a CLI flag, have a @tag which you can annotate tests: @tag repeat_until_failure: 10 and read this tag in the runner

Thoughts?

@SteffenDE
Copy link
Contributor Author

I don't think 1 is a good solution, because non-flaky modules could prevent others from running?

Concerning 2: maybe such a repeat tag would be nice as a complement to this in general?

I adapted the code to only store the modules when --repeat-until-failure is used. Wdyt?

@SteffenDE SteffenDE force-pushed the exunit_repeat_until_failure branch from 5e83893 to b3e8c4b Compare March 7, 2024 22:52
end

defp maybe_add_persistent_sync_modules(state, name) do
if Application.fetch_env!(:ex_unit, :repeat_until_failure) > 0 do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I am worried about depending on global state here. For example, ExUnit.configure can be called at any time and change the meaning of this. Ideally we will keep most state management in the runner. I can think of two other options:

  1. Pass a flag when we call take_modules in the runner to also make a copy of the modules we are taking
  2. Have the runner collect all modules as it runs them and explicitly push them again on restore_modules

I think 2 is the cleanest and requires only a small change to async_loop. There is no need of additional state and we most rely on existing server APIs. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the feedback, I appreciate that very much. I changed the code to use your suggestion 2.

@SteffenDE
Copy link
Contributor Author

Maybe it would be nice to use a new seed for every repeat if it was not explicitly configured with --seed?

@josevalim
Copy link
Member

Maybe it would be nice to use a new seed for every repeat if it was not explicitly configured with --seed?

Sounds like a solid idea to me. :) Maybe we move the loop a bit higher in the stack and run the whole configuration again?

@viniciusmuller
Copy link
Contributor

This feature will be very useful, thanks for implementing it @SteffenDE 😃

@SteffenDE
Copy link
Contributor Author

Maybe we move the loop a bit higher in the stack and run the whole configuration again?

Sounds good. I wonder how we should handle the seed, because we store the generated seed in the env and then we cannot easily say if the seed was explicitly set or generated:

defp persist_defaults(config) do
config |> Keyword.take([:max_cases, :seed, :trace]) |> configure()
config
end

What would happen if we don't persist the seed? (the tests still pass if I exclude the seed)

@SteffenDE
Copy link
Contributor Author

Hmm yep. Not persisting the seed is not compatible with #12442...

Comment on lines +523 to +524
# we want to change the seed when using with `repeat_until_failure`
Application.put_env(:ex_unit, :seed_generated, true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not pretty, but not sure what would be better

@SteffenDE
Copy link
Contributor Author

So, one important difference with the seed when using --repeat-until-failure is that modules are not recompiled with the order of the new seed, because they are already compiled on the first run. The changed seed therefore only affects the order of the test runs themselves.

I think that's reasonable.

@josevalim
Copy link
Member

I will go ahead and merge it and explore some ideas. :) Thank you!

@josevalim josevalim marked this pull request as ready for review March 8, 2024 14:31
@josevalim josevalim merged commit d75930b into elixir-lang:main Mar 8, 2024
8 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@saveman71
Copy link
Contributor

Landing on this PR as I am debugging a flaky test at the moment 🥲

20mn passes as I upgrade the codebase to 1.17 to test that new shiny tool...

20mn to debug the test, which ended up being an issue with multiple DateTime.now(), which was caught by this, (needed more than 100 runs though, but it's nice that you can focus a single test)

And done, thanks! That helped a lot! One thing that could help is to be able to, say, run tests with a concurrency of 10 or 100, as my issue would have been more prominent the slower things were, or to have another way of "slowing things down".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants