Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min_leaves: int = 0 argument to st.recursive() #4205

Open
helanhalvan opened this issue Dec 17, 2024 · 6 comments
Open

Add min_leaves: int = 0 argument to st.recursive() #4205

helanhalvan opened this issue Dec 17, 2024 · 6 comments
Labels
enhancement it's not broken, but we want it to be better

Comments

@helanhalvan
Copy link

helanhalvan commented Dec 17, 2024

Hello,

It is possible this is somewhere in the documentation, but I could not find it. I have a recursive strategy, and I would like to generate very large recursive structures with a low amount of examples. However there don't seem to be a way to force hypothesis to start generating large values.

Another framework I used has a max_size parameter, where sizes go from 0 to max_size over the number of examples, so if we want faster growth that can be set to some larger number as needed.

There is a couple of options here, either we could allow for specifying some size parameter in the environment (like @settings for example), or there could be some strategies for scaling other strategies like scale in this other framework. Lastly there could be a min_leaves added to recursive.

Also, just to be clear, I much prefer hypothesis over the other framework, it is only used as a reference because it allows for manipulating the underlying size concept more directly, which is something that would simplify my use-case a good bit.

My use-case is essentially that there is a significant overhead to running a single example, while the impact of a large input is insignificant by comparison.

EDIT: also this is a question/enhancement, could not figure out how to set labels

@tybug tybug added the enhancement it's not broken, but we want it to be better label Dec 17, 2024
@tybug
Copy link
Member

tybug commented Dec 17, 2024

I'll defer to @Zac-HD, but I'm not opposed to a min_leaves parameter in recursive!

Since automatically varying global size is one of the strengths of PBT, we try not to expose too much control over this distribution or growth rate to the user, who might dramatically weaken their strategy by setting it (as in your other suggestions).

@helanhalvan
Copy link
Author

Are there any methods currently available for changing size? It is a very sharp tool, and nothing that should go on the top of a tutorial, but making it somehow available will be good for some cases.

I agree that it can weaken strategies, but it can also strengthen them. One problem with min_leaves is that it inhibits shrinking, so the minimal examples will always be large.

Also there seems to be something strange with how distributions are effected by changing max_leaves. Made this minimal test setup:

from hypothesis import strategies as st
from hypothesis import given, event, settings
from dataclasses import dataclass

@dataclass
class BinTreeNode:
   lhs: any #Ast
   rhs: any #Ast

def tree_extend(child_gen):
   return st.builds(lambda a: BinTreeNode(a[0], a[1]), st.tuples(child_gen, child_gen))

@given(L=st.recursive(base=st.integers(), extend=tree_extend, max_leaves=int(1e100)))
@settings(max_examples=100, deadline=None)
def test_bin_trees(L):
     event(leaves(L))

def leaves(T):
    match T:
        case BinTreeNode():
            return leaves(T.lhs) + leaves(T.rhs)
        case _:
            return 1

Which peaks out at generating 555 leaves at 100 examples. Now, I would expect the speed at which examples increase in size to be proportional to max_leaves, however 555 is a lot less than 1e100, and the difference to between 1000 and 1e100 seems negligible.

@Zac-HD
Copy link
Member

Zac-HD commented Dec 23, 2024

As a deliberate design choice, Hypothesis offers user control over the domain of generated values, but not over the distribution. Adding a min_leaves= argument would tune the domain, and so I'm happy to include it.

So... why not allow control of the distribution?

  • As you note, it's a very sharp tool, and easy for beginners (or even experts) to get into trouble or make their tests less effective by using it; I therefore prefer to nudge people towards e.g. hypofuzz or hypothesis-crosshair on the basis that tuning distributions is really a job for computers rather than humans.
    • You might also want to try hypothesis.target(size_of(thing)), to hill-climb on that.
  • It would add a substantially new chunk of complexity to our API, which benefits a few but costs everyone who learns about PBT or reads through our documentation or function signatures.
  • We pretty regularly change the distribution of generated values, both by adding new heuristics to hit particular kinds of edge case, and by making more general changes to our internals. Trying to maintain a sensible notion of distributions without breaking user code would make this much much harder.

Finally, Hypothesis just doesn't necessarily try to hit the maximum size you specify, and we have an internal size limit on the number of choices we make during generation which usually nets out to a few thousand elements at most. This is also arbitrary, but in practice gives a good balance between performance and diversity which tends to maximize bug-finding power.

@helanhalvan
Copy link
Author

Thanks for the detailed reply! I'll have a look at hypothesis.target. Do you have any docs on this internal size limit, are there parameters to increase it?

For my use-case, hypothesis (or even python) runtime is insignificant, the test itself runs CLI tools that are very slow relative to generation, it's essentially integration testing of CLI tools with python/pytest/hypothesis as a test-running framework. There are also good reasons to believe that we will need to hit some (unknown) size threshold to trigger a bunch of important code paths (attempts at compression etc.).

With that context, I don't think fuzzing is really an option at this point, and why I really want to tell hypothesis to make some large collections and then shrink until there is a minimal example.

While I understand the concerns about API complexity and backwards compatibility, there should be a way to give some control to the user. If the documentation if structured so it is clear that these are very sharp tools I don't think it's a problem that they exist themselves.

Then, some limitations:

  1. Only for collections, collection size needed to trigger problems will vary wildly, while if you need specific primitive values you should probably build them in test (like generate a {x, y}, do z = x << y and pass z to your test.
  2. Only linear scaling, so if the regular collection would have x elements, you can get x*n elements, for whatever n you want. There are already ways to achieve this (or similar effects) for collections, by generating multiple collections and joining them, which seems on the surface to "just work" but seems likely to cause shrinking to be less efficient as the framework no longer really knows what is being generated.

That said there is little point if we will just hit the internal size limit and that cannot be changed.

@Zac-HD
Copy link
Member

Zac-HD commented Jan 5, 2025

No docs, because it's explicitly an internal implementation detail, and likely to change over the next few months as we work on #3921.

I think this is a case where monkeypatching is probably the way to go; you can see the average-size calculations here and patch the inherent size limits here; you may also want or need to mess with the many() helper for performance reasons. I strongly recommend writing a little benchmark for this before you start. Note also that this will probably break various other things like database interoperability, and is generally at your own risk.

@helanhalvan
Copy link
Author

#3921 seems like a good project.

Started experimenting with hypothesis.target and found an unfortunate thing, there is re-drawing happening in order to generate keys to a dictionary, which in turn makes target mostly ineffective, not sure if the exact connection, but in this example you get about ~3x larger target values when using characters over a limited character range:

from hypothesis import given, event, settings, target
from hypothesis import strategies as st

@given(Raw=st.dictionaries(keys=st.characters(), values=st.binary()
              , min_size=1))
@settings(max_examples=100)
def test_large(Raw):
    total_size = 0
    for item in Raw.items():
        data = None
        match item[1]:
            case bytes():
                data = item[1]
                total_size = total_size + len(data)
    target(total_size)
    event(total_size)


@given(Raw=st.dictionaries(keys=st.characters(min_codepoint=97, max_codepoint=122), values=st.binary()
              , min_size=1))
@settings(max_examples=100)
def test_small(Raw):
    total_size = 0
    for item in Raw.items():
        data = None
        match item[1]:
            case bytes():
                data = item[1]
                total_size = total_size + len(data)
    target(total_size)
    event(total_size)

Could file it as a separate bug report if that makes sense. Also going from 100 to 1000 examples does not increase target that much, so it seems I'm hitting some of those internal limits. I can work around this by generating vectors instead of dicts etc.

@Zac-HD Zac-HD changed the title Size options for recursive strategies Add min_leaves: int = 0 argument to st.recursive() Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

No branches or pull requests

3 participants