Allow customization of step runner in run_state_machine_as_test #1709

touilleMan · 2018-12-19T16:53:02Z

Hi,

Release 3.84 broke hypothesis-trio (it used to monkey patch StateMachineRunner.run, but StateMachineRunner is gone now)

This is needed because we must start a trio loop to run the TrioGenericStateMachine (given it execute_step, check_invariants and teardown methods are asynchronous)

I think we could clean this hack by having run_state_machine_as_test checking if the given state machine contains a custom step runner:

async def run_state_machine_as_test(state_machine_factory, settings=None):
    if settings is None:
        try:
            settings = state_machine_factory.TestCase.settings
            check_type(Settings, settings, "state_machine_factory.TestCase.settings")
        except AttributeError:
            settings = Settings(
                timeout=unlimited,
                deadline=None,
                suppress_health_check=HealthCheck.all(),
            )
    check_type(Settings, settings, "settings")

    @settings
    @given(st.data())
    def run_state_machine(factory, data):
        machine = factory()
        check_type(GenericStateMachine, machine, "state_machine_factory()")
        data.conjecture_data.hypothesis_runner = machine

        n_steps = settings.stateful_step_count
        should_continue = cu.many(
            data.conjecture_data, min_size=1, max_size=n_steps, average_size=n_steps
        )

        print_steps = (
            current_build_context().is_final or current_verbosity() >= Verbosity.debug
        )

        # =================== Changes ================
        def _default_step_runner(data, print_steps, should_continue):
            try:
                if print_steps:
                    machine.print_start()
                machine.check_invariants()

                while should_continue.more():
                    value = data.conjecture_data.draw(machine.steps())
                    if print_steps:
                        machine.print_step(value)
                    machine.execute_step(value)
                    machine.check_invariants()
            finally:
                if print_steps:
                    machine.print_end()
                machine.teardown()

        step_runner = getattr(machine, 'step_runner', _default_step_runner)
        step_runner(data, print_steps, should_continue)
        # =================== End of changes ================

    # Use a machine digest to identify stateful tests in the example database
    run_state_machine.hypothesis.inner_test._hypothesis_internal_add_digest = function_digest(
        state_machine_factory
    )
    # Copy some attributes so @seed and @reproduce_failure "just work"
    run_state_machine._hypothesis_internal_use_seed = getattr(
        state_machine_factory, "_hypothesis_internal_use_seed", None
    )
    run_state_machine._hypothesis_internal_use_reproduce_failure = getattr(
        state_machine_factory, "_hypothesis_internal_use_reproduce_failure", None
    )

    run_state_machine(state_machine_factory)

What do you think ? 😃

The text was updated successfully, but these errors were encountered:

Zac-HD · 2018-12-20T03:15:30Z

I think async stateful testing is harder than you think 😨. This code doesn't really work...

try:
    if print_steps:
        self.print_start()
    await self.check_invariants()
    while should_continue.more():
        value = data.draw(self.steps())
        if print_steps:
            self.print_step(value)
        await self.execute_step(value)
        await self.check_invariants()
finally:
    if print_steps:
        self.print_end()
    await self.teardown()
    self._nursery.cancel_scope.cancel()

...because the order in which functions execute is really important and Hypothesis can't control or even see it. Trio randomises task order on it's event loop, which is fine. Unfortunately it also uses an independent random.Random instance per Runner, so this remains nondeterministic even when Hypothesis seeds the global random module.

(to clarify: Hypothesis can't shrink or even reliably report non-deterministic bugs, because they tend to be so flaky, and that includes everything on top of the non-deterministic scheduler)

So we have two potential solutions, which might end up as two distinct StateMachine types:

Define the execute_step and check_invariants methods as sync functions, which schedule async rule or invariant methods on a class-level event loop and run them to completion.
Pros: reproducible, clean, compatible with older versions of both Trio and Hypothesis.
Cons: manually handling the Trio event loop (!?), not very async at all.
Provide a hook somewhere in Hypothesis to insert an async-to-sync shim in run_state_machine_as_test, as we do for direct @given tests. This would also require some changes to Trio so the scheduler can be made deterministic for testing. Ideally the design would be extensible to injecting delays and maybe other faults as described in Tools for finding scheduler-dependent "heisenbugs" python-trio/trio#239 as a future change.
Pros: Really does interleave tasks from various async rule methods.
Cons: difficult and unproven approach may not even work. Still less deterministic. Invariants may all trigger at the beginning and not be checked after later rules.

Zac-HD · 2018-12-20T03:17:27Z

I would also remove TrioGenericStateMachine in favor of st.data() (see #1693), so we only need to fix (or reimplement) TrioRuleBasedStateMachine.

njsmith · 2018-12-20T04:28:53Z

I think it would be problematic for Trio to use the global random instance (discussed here: python-trio/trio#239 (comment)), but it would be easy to provide some way to explicitly seed Trio's random state (e.g. by passing an argument to trio.run). Would that be enough? I guess the question would be how to glue things together appropriately. Hypothesis has built-in knowledge of random's seed and numpy.random's seed, so I guess it could have built-in knowledge of trio's seed too, but I'm not sure that's the best approach. pytest-trio knows when it's handling a hypothesis test, so it could potentially take care of the seeding too. Of course that would only work when using pytest-trio + @given on an async test, which covers a lot of the cases but I don't know if it covers all of them. In particular, I don't know if it helps with the stateful testing stuff :-)

Alternatively: random.Random() pulls entropy from the operating system. Maybe while hypothesis is running, it should monkeypatch that constructor so it can control the seed of the new Random object? That wouldn't actually help for Trio right now because we construct a single global Random object at import time, but it would be trivial to delay that until the call to trio.run.

Zac-HD · 2018-12-20T06:02:04Z

I think the general problem of "the Trio scheduler is non-deterministic, but we want deterministic or at least reproducible tests" should probably be solved in trio.testing! In particular it affects users who are not using Hypothesis, or even pytest-trio.

right now ... we [Trio] construct a single global Random object at import time, but it would be trivial to delay that until the call to trio.run.

Nope, each call to trio.run creates a new Random object! Using a global object which can be monkey-patched by pytest-trio and hypothesis-trio would significantly improve the reproducibility of scheduler-related heisenbugs.

Long-term, trio may well adopt a deterministic scheduling policy so I'd rather not over-engineer anything right now 😉. Truly deterministic tests also require that the completion order of concurrent operations (hence tasks available to schedule) be deterministic too. However I'm happy to leave controlling IO delays and faults for another day!

njsmith · 2018-12-20T08:47:35Z

Nope, each call to trio.run creates a new Random object!

Oh, we're both right. There's a global Random object (used to give the system clock a random offset), and a per-run Random object (used for random scheduling). But system clock is effectively non-deterministic anyway, so I guess that doesn't matter.

I think the general problem of "the Trio scheduler is non-deterministic, but we want deterministic or at least reproducible tests" should probably be solved in trio.testing! In particular it affects users who are not using Hypothesis, or even pytest-trio.

But if you're not using hypothesis, then you should use the random scheduling, because it's a kind of poor-man's-hypothesis, to make sure your code doesn't depend on scheduling order :-). If the randomness affects your tests then that means you have a bug, and a non-deterministic failure is better than a deterministic pass that hides the bug until it's too late.

Of course when you're using hypothesis that's different: you still want random scheduling order, it's just that you want the randomness to be under hypothesis's control. (And ideally with a custom strategy that picks "weird" schedules to flush out bugs, but we can ignore that for now.)

Zac-HD · 2018-12-21T13:56:49Z

If the randomness affects your tests then that means you have a bug, and a non-deterministic failure is better than a deterministic pass that hides the bug until it's too late.

Agreed! You usually need a pretty good trace or replay capability to understand and fix this kind of bug, but Trio is about as good for that as you can get 😄

Oh, we're both right. [...]

Concretely then, would you be happy with a PR against Trio to make the scheduler use the global Random, and then one against pytest-trio to monkey-patch that while Hypothesis tests are running?

Note that this would make the scheduler deterministic so that Hypothesis can minimise examples which exhibit observable scheduler-related behaviour, at the cost of hiding the relatively rare scheduling-related bugs that would only be exposed by a different seed.

DRMacIver · 2018-12-21T14:33:34Z

BTW what we do in hypothesis core which would also work here is to create new Random objects explictly seeded by random.getrandbits() from the global PRNG. This retains the desired locality of randomness but means that when someone calls random.seed() (as Hypothesis does) the system becomes deterministic.

njsmith · 2018-12-24T05:34:12Z

@Zac-HD

Concretely then, would you be happy with a PR against Trio to make the scheduler use the global Random, and then one against pytest-trio to monkey-patch that while Hypothesis tests are running?

An argument to trio.run might be cleaner than reaching into trio._core._run to monkeypatch stuff? But either would probably be fine, really.

@DRMacIver

BTW what we do in hypothesis core which would also work here is to create new Random objects explictly seeded by random.getrandbits() from the global PRNG. This retains the desired locality of randomness but means that when someone calls random.seed() (as Hypothesis does) the system becomes deterministic.

This still perturbs the global PRNG though, so if someone has some script where they use random.seed, then importing Trio could suddenly change that script's behavior. Possibly my time working with newbie scientists has made me over-paranoid about this :-).

Zac-HD · 2019-01-17T07:30:46Z

Alright, I think there's two parts to this issue:

First, how the Trio scheduler can be made deterministic for Hypothesis: #1714 and python-trio/pytest-trio#73 between them manage it.

hypothesis-trio will also need to manage the seed (by the same method), since not all users will also be using pytest-trio. If they are it's zero-cost though as registering the same random.Random instance a second time is a no-op. Deeper control of the scheduler is also possible but beyond the scope of this issue.

Second, I think hypothesis-trio will need to have an independent implementation of run_state_machine_as_test - aysnc is just too different to sync to share this very short function. I'd also recommend removing the idea of a generic state machine, as we're planning to in #1693, which should simplify the rule-based option a bit. You'd then have an async inner function that gets passed to trio.run instead of a sync inner function that gets called.

So under my proposal there's not much more to be done in upstream Hypothesis - though I remain happy to help with integrations on the Trio end too 😄

Zac-HD mentioned this issue Dec 20, 2018

Tools for finding scheduler-dependent "heisenbugs" python-trio/trio#239

Open

Zac-HD added the question not sure it's a bug? questions welcome label Dec 20, 2018

This was referenced Jan 6, 2019

Use module-level Random to shuffle tasks python-trio/trio#840

Merged

Hook for managing any Random instance #1741

Merged

Deterministic scheduling via Hypothesis python-trio/pytest-trio#73

Merged

Zac-HD self-assigned this Jan 17, 2019

Zac-HD closed this as completed in #1741 Jan 22, 2019

Zac-HD mentioned this issue Feb 1, 2019

Update to hypothesis 4.4.3 python-trio/hypothesis-trio#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow customization of step runner in run_state_machine_as_test #1709

Allow customization of step runner in run_state_machine_as_test #1709

touilleMan commented Dec 19, 2018 •

edited

Loading

Zac-HD commented Dec 20, 2018

Zac-HD commented Dec 20, 2018

njsmith commented Dec 20, 2018

Zac-HD commented Dec 20, 2018

njsmith commented Dec 20, 2018

Zac-HD commented Dec 21, 2018

DRMacIver commented Dec 21, 2018

njsmith commented Dec 24, 2018

Zac-HD commented Jan 17, 2019

Allow customization of step runner in run_state_machine_as_test #1709

Allow customization of step runner in run_state_machine_as_test #1709

Comments

touilleMan commented Dec 19, 2018 • edited Loading

Zac-HD commented Dec 20, 2018

Zac-HD commented Dec 20, 2018

njsmith commented Dec 20, 2018

Zac-HD commented Dec 20, 2018

njsmith commented Dec 20, 2018

Zac-HD commented Dec 21, 2018

DRMacIver commented Dec 21, 2018

njsmith commented Dec 24, 2018

Zac-HD commented Jan 17, 2019

touilleMan commented Dec 19, 2018 •

edited

Loading