Custom strategies beyond composites #1381

pckroon · 2018-07-03T15:03:42Z

Hi all,

First off, thanks! Hypothesis has helped me find a lot of bugs in my code. I'm lagging behind in fixing them though (maybe a next tool? ;))

The code I have works on networkx graphs (https://networkx.github.io/documentation/stable/reference/classes/graph.html#networkx.Graph). I have made a composite strategy for making graphs (see below), and that works. My only problem with it is that is doesn't shrink very nice, and the construction can probably also be improved. Is there any documentation or examples available for creating such advanced strategies? Is there any advice from strategy experts here on how to approach the building and shrinking? And how would I go about testing my strategy? Once finished I intend to publish it under hypothesis-networkx, but that's still some distance in the future.

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  graph_type=nx.Graph, 
                  add_edges_untill=lambda x: True, self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = sorted(draw(st.sets(node_keys, min_size=min_nodes, max_size=max_nodes)))
    for node in nodes:
        g.add_node(node, **draw(node_data))

    while g and not add_edges_untill(g)):
        n1 = draw(st.sampled_from(nodes))
        n2 = draw(st.sampled_from(nodes))
        if not self_loops:
            st.assume(n1 != n2)
        g.add_edge(n1, n2, **draw(edge_data))

    return g

The text was updated successfully, but these errors were encountered:

Zac-HD · 2018-07-04T04:55:45Z

Hi Peter - it's great to hear that Hypothesis has been useful (or at least found bugs 😉) for you!

Unfortunately the answers to pretty much all of your questions are "it depends" - but I can start with some things to try, and happy to keep discussing especially if you can tell me more about what you want to achieve and what's currently happening.

The node_keys may not be orderable (if someone e.g. mixed ints and strings) - instead of sorted(draw(st.sets(...))), I'd use draw(st.lists(..., unique=True)).
Shrink quality is always tricky, but I can offer some general advice - keeping draws of related data together usually helps. I'd experiment with getting a list of node keys and attributes at the same time, with draw(st.lists(st.tuples(node_keys, node_data), ..., unique_by=lambda t: t[0])).
The trickiest bit to get working as you want is probably the edges, and I don't understand what you want well enough to give useful advice there, sorry 😞.
General tips on edges though, unique-by-construction is usually more efficient than filtering - unlikely to be worth more complex code to distinguish n1 and n2, but it might be useful to exclude generating the same edge twice to improve shrinking quality for small examples.

So that's a pile of things to try which may or may not help you - let me know!

DRMacIver · 2018-07-04T08:43:18Z

My only problem with it is that is doesn't shrink very nice,

Could you give some examples of it shrinking badly?

The general Hypothesis philosophy is that Shrinking Is Magic (TM). The underlying model is supposed to handle shrinking for you without you having to worry about it too much. Obviously this is more true in theory than in practice! (Though it's true in practice surprisingly often). As such we're always interested in examples of bad shrinking.

~~That being said, I don't currently understand where your strategy actually ever stops adding edges. By default isn't this an infinite loop?~~ Edit: Sorry I misunderstood the condition. Instead my question is... why does this only add one edge by default?

In terms of more advanced features... composite is pretty much it. There are a few internal things we could expose that might be helpful, but composite is pretty close to the actual underlying model of how data is generated in Hypothesis with some of the book-keeping handled for you.

pckroon · 2018-07-04T08:48:05Z

Thanks for the replies :)

Could you give some examples of it shrinking badly?

I have some cases where the actual problem is in the data of one of the nodes, but it also gives me 10 other nodes (with edges) that are actually irrelevant to the issue. Or a case where more nodes could be removed. Do you want the exact details?

That being said, I don't currently understand where your strategy actually ever stops adding edges. By default isn't this an infinite loop?

Correct, my bad. In practice I always give it a function that returns True once the graph is connected.

I'll play around with the tips from Zac; I'll keep you posted.

DRMacIver · 2018-07-04T10:16:32Z

Correct, my bad. In practice I always give it a function that returns True once the graph is connected.

Ah, that would be the problem.

Basically what's happening here is that when Hypothesis tries to remove a node during shrinking, this changes the edges you add later - they might point to different nodes or get discarded as invalid because they now point past the end of the node list, etc (basically an edge is being represented as a pair of indices into the node list). This means that trying to remove a node is often going to result in the graph becoming disconnected, which results in Hypothesis trying to draw more edges to complete it.

In general this kind of "generate stuff until some condition is met" approach is unlikely to shrink well, and you're better off trying to construct the object so that the condition is always satisfied as you go. I'll have a think about how one might do that well for connected graphs - maybe iterate over the node list and add an edge between each node and one sampled from the previous nodes? That will guarantee that there is a single connected component, and you can then add a bunch of edges in after. (I can't promise this will shrink well either, but it has a better chance of working)

pckroon · 2018-07-04T12:53:50Z

Hmmm, tough cookie. Thanks for your insights.

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

Within the current paradigm, every time I add a node, I can/should connect it to an already existing node? I'm not sure this will fix the actual problem: whenever you remove an arbitrary node you run the risk of ending up with a disconnected graph. The only alternative I see now is to assume by graph is connected, but I fear generation will suffer over this. I'll give it a whirl, and see what comes out.

For comparison, here's what I came up with based on Zac's comments. It behaves a little better it seems, but still not quite optimal.

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  min_edges=0, max_edges=None,
                  graph_type=nx.Graph,
                  add_edges_until=nx.is_connected,
                  self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
                          max_size=max_nodes, unique_by=lambda t: t[0]))
    for node, data in nodes:
        g.add_node(node, **data)

    if max_edges is None:
        max_edges = len(nodes) * (len(nodes) - 1)

    node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
                         min_size=2, max_size=2,
                         unique=not self_loops)
    edge = st.tuples(node_pair, edge_data)
    edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
                          unique_by=lambda t: tuple(t[0])))

    for (n1, n2), data in edges:
        g.add_edge(n1, n2, **data)
    while g and not add_edges_until(g):
        (n1, n2), data = draw(edge)
        g.add_edge(n1, n2, **data)

    return g

DRMacIver · 2018-07-04T13:39:00Z

Within the current paradigm, every time I add a node, I can/should connect it to an already existing node? I'm not sure this will fix the actual problem: whenever you remove an arbitrary node you run the risk of ending up with a disconnected graph

Right, but removing a node doesn't just delete the edge pointing to it, it causes that edge to point to something else (unless it's the last node, in which case it does get removed).

This definitely won't be perfect, but I think you might be pleasantly surprised by how well it shrinks. Here's some simplified code I put together to check out how well it works and it certainly seems to do all the basics right:

@st.composite
def connected_graphs(draw):
    g = networkx.Graph()
    nodes = draw(st.lists(st.integers(), unique=True))

    for n in nodes:
        g.add_node(n)

    for i in range(1, len(nodes)):
        j = draw(st.integers(0, i - 1))
        g.add_edge(nodes[i], nodes[j])

    for n1, n2 in draw(
        st.lists(st.tuples(st.sampled_from(nodes), st.sampled_from(nodes)))
    ):
        g.add_edge(n1, n2)

    return g

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

It's a fairly fundamental design feature of Hypothesis that you can't do that I'm afraid. The problem is that as soon as you introduce that sort of custom shrinking, everything fails to compose properly: In Hypothesis you only have to specify how to be able to construct an object, so things like @composite can work nicely by just chaining construction together, but as soon as you try to add custom shrinking you need to be able to specify how to deconstruct it, which can't be chained through the definition in the same way.

pckroon · 2018-07-05T09:13:38Z

Thanks for your input, I'll definitely give it a try!

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

It's a fairly fundamental design feature of Hypothesis that you can't do that I'm afraid. The problem is that as soon as you introduce that sort of custom shrinking, everything fails to compose properly: In Hypothesis you only have to specify how to be able to construct an object, so things like @composite can work nicely by just chaining construction together, but as soon as you try to add custom shrinking you need to be able to specify how to deconstruct it, which can't be chained through the definition in the same way.

And this is exactly why I needed your expert insights :) I don't think I grasp all the subtleties, so I'll take your word for it.

pckroon · 2018-07-10T09:06:50Z

To wrap this thread up. I ended up with the following. It has a few more bells and whistles than the code @DRMacIver provided, but it does essentially the same. I'll probably clean it a bit, and create a hypothesis-networkx somewhere. Thanks again for your invaluable help!

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  min_edges=0, max_edges=None,
                  graph_type=nx.Graph,
                  self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
                          max_size=max_nodes, unique_by=lambda t: t[0]))
    g.add_node(nodes[0][0], **nodes[0][1])
    for node, data in nodes[1:]:
        edge_to = draw(st.sampled_from(sorted(g.nodes)))
        g.add_node(node, **data)
        g.add_edge(node, edge_to, **draw(edge_data))

    if max_edges is None:
        max_edges = len(nodes) * (len(nodes) - 1)

    min_edges = max(0, min_edges - len(g.edges))
    max_edges = min(len(nodes) * (len(nodes) - 1), max_edges - len(g.edges))

    node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
                         min_size=2, max_size=2,
                         unique=not self_loops)
    edge = st.tuples(node_pair, edge_data)
    edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
                          unique_by=lambda t: tuple(t[0])))

    for (n1, n2), data in edges:
        g.add_edge(n1, n2, **data)

    return g

Zac-HD added the question not sure it's a bug? questions welcome label Jul 4, 2018

Zac-HD closed this as completed Jul 4, 2018

Zac-HD mentioned this issue Dec 18, 2018

We should add a guide to writing shrinker-friendly strategies #1705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom strategies beyond composites #1381

Custom strategies beyond composites #1381

pckroon commented Jul 3, 2018

Zac-HD commented Jul 4, 2018

DRMacIver commented Jul 4, 2018 •

edited

Loading

pckroon commented Jul 4, 2018 •

edited

Loading

DRMacIver commented Jul 4, 2018

pckroon commented Jul 4, 2018

DRMacIver commented Jul 4, 2018

pckroon commented Jul 5, 2018

pckroon commented Jul 10, 2018

Custom strategies beyond composites #1381

Custom strategies beyond composites #1381

Comments

pckroon commented Jul 3, 2018

Zac-HD commented Jul 4, 2018

DRMacIver commented Jul 4, 2018 • edited Loading

pckroon commented Jul 4, 2018 • edited Loading

DRMacIver commented Jul 4, 2018

pckroon commented Jul 4, 2018

DRMacIver commented Jul 4, 2018

pckroon commented Jul 5, 2018

pckroon commented Jul 10, 2018

DRMacIver commented Jul 4, 2018 •

edited

Loading

pckroon commented Jul 4, 2018 •

edited

Loading