Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customization of step runner in run_state_machine_as_test #1709

Closed
touilleMan opened this issue Dec 19, 2018 · 9 comments
Closed

Allow customization of step runner in run_state_machine_as_test #1709

touilleMan opened this issue Dec 19, 2018 · 9 comments
Assignees
Labels
question not sure it's a bug? questions welcome

Comments

@touilleMan
Copy link
Member

touilleMan commented Dec 19, 2018

Hi,

Release 3.84 broke hypothesis-trio (it used to monkey patch StateMachineRunner.run, but StateMachineRunner is gone now)

This is needed because we must start a trio loop to run the TrioGenericStateMachine (given it execute_step, check_invariants and teardown methods are asynchronous)

I think we could clean this hack by having run_state_machine_as_test checking if the given state machine contains a custom step runner:

async def run_state_machine_as_test(state_machine_factory, settings=None):
    if settings is None:
        try:
            settings = state_machine_factory.TestCase.settings
            check_type(Settings, settings, "state_machine_factory.TestCase.settings")
        except AttributeError:
            settings = Settings(
                timeout=unlimited,
                deadline=None,
                suppress_health_check=HealthCheck.all(),
            )
    check_type(Settings, settings, "settings")

    @settings
    @given(st.data())
    def run_state_machine(factory, data):
        machine = factory()
        check_type(GenericStateMachine, machine, "state_machine_factory()")
        data.conjecture_data.hypothesis_runner = machine

        n_steps = settings.stateful_step_count
        should_continue = cu.many(
            data.conjecture_data, min_size=1, max_size=n_steps, average_size=n_steps
        )

        print_steps = (
            current_build_context().is_final or current_verbosity() >= Verbosity.debug
        )

        # =================== Changes ================
        def _default_step_runner(data, print_steps, should_continue):
            try:
                if print_steps:
                    machine.print_start()
                machine.check_invariants()

                while should_continue.more():
                    value = data.conjecture_data.draw(machine.steps())
                    if print_steps:
                        machine.print_step(value)
                    machine.execute_step(value)
                    machine.check_invariants()
            finally:
                if print_steps:
                    machine.print_end()
                machine.teardown()

        step_runner = getattr(machine, 'step_runner', _default_step_runner)
        step_runner(data, print_steps, should_continue)
        # =================== End of changes ================

    # Use a machine digest to identify stateful tests in the example database
    run_state_machine.hypothesis.inner_test._hypothesis_internal_add_digest = function_digest(
        state_machine_factory
    )
    # Copy some attributes so @seed and @reproduce_failure "just work"
    run_state_machine._hypothesis_internal_use_seed = getattr(
        state_machine_factory, "_hypothesis_internal_use_seed", None
    )
    run_state_machine._hypothesis_internal_use_reproduce_failure = getattr(
        state_machine_factory, "_hypothesis_internal_use_reproduce_failure", None
    )

    run_state_machine(state_machine_factory)

What do you think ? 😃

@Zac-HD
Copy link
Member

Zac-HD commented Dec 20, 2018

I think async stateful testing is harder than you think 😨. This code doesn't really work...

try:
    if print_steps:
        self.print_start()
    await self.check_invariants()
    while should_continue.more():
        value = data.draw(self.steps())
        if print_steps:
            self.print_step(value)
        await self.execute_step(value)
        await self.check_invariants()
finally:
    if print_steps:
        self.print_end()
    await self.teardown()
    self._nursery.cancel_scope.cancel()

...because the order in which functions execute is really important and Hypothesis can't control or even see it. Trio randomises task order on it's event loop, which is fine. Unfortunately it also uses an independent random.Random instance per Runner, so this remains nondeterministic even when Hypothesis seeds the global random module.

(to clarify: Hypothesis can't shrink or even reliably report non-deterministic bugs, because they tend to be so flaky, and that includes everything on top of the non-deterministic scheduler)

So we have two potential solutions, which might end up as two distinct StateMachine types:

  • Define the execute_step and check_invariants methods as sync functions, which schedule async rule or invariant methods on a class-level event loop and run them to completion.
    Pros: reproducible, clean, compatible with older versions of both Trio and Hypothesis.
    Cons: manually handling the Trio event loop (!?), not very async at all.

  • Provide a hook somewhere in Hypothesis to insert an async-to-sync shim in run_state_machine_as_test, as we do for direct @given tests. This would also require some changes to Trio so the scheduler can be made deterministic for testing. Ideally the design would be extensible to injecting delays and maybe other faults as described in Tools for finding scheduler-dependent "heisenbugs" python-trio/trio#239 as a future change.
    Pros: Really does interleave tasks from various async rule methods.
    Cons: difficult and unproven approach may not even work. Still less deterministic. Invariants may all trigger at the beginning and not be checked after later rules.

@Zac-HD
Copy link
Member

Zac-HD commented Dec 20, 2018

I would also remove TrioGenericStateMachine in favor of st.data() (see #1693), so we only need to fix (or reimplement) TrioRuleBasedStateMachine.

@njsmith
Copy link

njsmith commented Dec 20, 2018

I think it would be problematic for Trio to use the global random instance (discussed here: python-trio/trio#239 (comment)), but it would be easy to provide some way to explicitly seed Trio's random state (e.g. by passing an argument to trio.run). Would that be enough? I guess the question would be how to glue things together appropriately. Hypothesis has built-in knowledge of random's seed and numpy.random's seed, so I guess it could have built-in knowledge of trio's seed too, but I'm not sure that's the best approach. pytest-trio knows when it's handling a hypothesis test, so it could potentially take care of the seeding too. Of course that would only work when using pytest-trio + @given on an async test, which covers a lot of the cases but I don't know if it covers all of them. In particular, I don't know if it helps with the stateful testing stuff :-)

Alternatively: random.Random() pulls entropy from the operating system. Maybe while hypothesis is running, it should monkeypatch that constructor so it can control the seed of the new Random object? That wouldn't actually help for Trio right now because we construct a single global Random object at import time, but it would be trivial to delay that until the call to trio.run.

@Zac-HD
Copy link
Member

Zac-HD commented Dec 20, 2018

I think the general problem of "the Trio scheduler is non-deterministic, but we want deterministic or at least reproducible tests" should probably be solved in trio.testing! In particular it affects users who are not using Hypothesis, or even pytest-trio.

right now ... we [Trio] construct a single global Random object at import time, but it would be trivial to delay that until the call to trio.run.

Nope, each call to trio.run creates a new Random object! Using a global object which can be monkey-patched by pytest-trio and hypothesis-trio would significantly improve the reproducibility of scheduler-related heisenbugs.

Long-term, trio may well adopt a deterministic scheduling policy so I'd rather not over-engineer anything right now 😉. Truly deterministic tests also require that the completion order of concurrent operations (hence tasks available to schedule) be deterministic too. However I'm happy to leave controlling IO delays and faults for another day!

@Zac-HD Zac-HD added the question not sure it's a bug? questions welcome label Dec 20, 2018
@njsmith
Copy link

njsmith commented Dec 20, 2018

Nope, each call to trio.run creates a new Random object!

Oh, we're both right. There's a global Random object (used to give the system clock a random offset), and a per-run Random object (used for random scheduling). But system clock is effectively non-deterministic anyway, so I guess that doesn't matter.

I think the general problem of "the Trio scheduler is non-deterministic, but we want deterministic or at least reproducible tests" should probably be solved in trio.testing! In particular it affects users who are not using Hypothesis, or even pytest-trio.

But if you're not using hypothesis, then you should use the random scheduling, because it's a kind of poor-man's-hypothesis, to make sure your code doesn't depend on scheduling order :-). If the randomness affects your tests then that means you have a bug, and a non-deterministic failure is better than a deterministic pass that hides the bug until it's too late.

Of course when you're using hypothesis that's different: you still want random scheduling order, it's just that you want the randomness to be under hypothesis's control. (And ideally with a custom strategy that picks "weird" schedules to flush out bugs, but we can ignore that for now.)

@Zac-HD
Copy link
Member

Zac-HD commented Dec 21, 2018

If the randomness affects your tests then that means you have a bug, and a non-deterministic failure is better than a deterministic pass that hides the bug until it's too late.

Agreed! You usually need a pretty good trace or replay capability to understand and fix this kind of bug, but Trio is about as good for that as you can get 😄

Oh, we're both right. [...]

Concretely then, would you be happy with a PR against Trio to make the scheduler use the global Random, and then one against pytest-trio to monkey-patch that while Hypothesis tests are running?

Note that this would make the scheduler deterministic so that Hypothesis can minimise examples which exhibit observable scheduler-related behaviour, at the cost of hiding the relatively rare scheduling-related bugs that would only be exposed by a different seed.

@DRMacIver
Copy link
Member

BTW what we do in hypothesis core which would also work here is to create new Random objects explictly seeded by random.getrandbits() from the global PRNG. This retains the desired locality of randomness but means that when someone calls random.seed() (as Hypothesis does) the system becomes deterministic.

@njsmith
Copy link

njsmith commented Dec 24, 2018

@Zac-HD

Concretely then, would you be happy with a PR against Trio to make the scheduler use the global Random, and then one against pytest-trio to monkey-patch that while Hypothesis tests are running?

An argument to trio.run might be cleaner than reaching into trio._core._run to monkeypatch stuff? But either would probably be fine, really.

@DRMacIver

BTW what we do in hypothesis core which would also work here is to create new Random objects explictly seeded by random.getrandbits() from the global PRNG. This retains the desired locality of randomness but means that when someone calls random.seed() (as Hypothesis does) the system becomes deterministic.

This still perturbs the global PRNG though, so if someone has some script where they use random.seed, then importing Trio could suddenly change that script's behavior. Possibly my time working with newbie scientists has made me over-paranoid about this :-).

@Zac-HD
Copy link
Member

Zac-HD commented Jan 17, 2019

Alright, I think there's two parts to this issue:

First, how the Trio scheduler can be made deterministic for Hypothesis: #1714 and python-trio/pytest-trio#73 between them manage it.

hypothesis-trio will also need to manage the seed (by the same method), since not all users will also be using pytest-trio. If they are it's zero-cost though as registering the same random.Random instance a second time is a no-op. Deeper control of the scheduler is also possible but beyond the scope of this issue.

Second, I think hypothesis-trio will need to have an independent implementation of run_state_machine_as_test - aysnc is just too different to sync to share this very short function. I'd also recommend removing the idea of a generic state machine, as we're planning to in #1693, which should simplify the rule-based option a bit. You'd then have an async inner function that gets passed to trio.run instead of a sync inner function that gets called.

So under my proposal there's not much more to be done in upstream Hypothesis - though I remain happy to help with integrations on the Trio end too 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question not sure it's a bug? questions welcome
Projects
None yet
Development

No branches or pull requests

4 participants