Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom strategies beyond composites #1381

Closed
pckroon opened this issue Jul 3, 2018 · 8 comments
Closed

Custom strategies beyond composites #1381

pckroon opened this issue Jul 3, 2018 · 8 comments
Labels
question not sure it's a bug? questions welcome

Comments

@pckroon
Copy link
Contributor

pckroon commented Jul 3, 2018

Hi all,

First off, thanks! Hypothesis has helped me find a lot of bugs in my code. I'm lagging behind in fixing them though (maybe a next tool? ;))

The code I have works on networkx graphs (https://networkx.github.io/documentation/stable/reference/classes/graph.html#networkx.Graph). I have made a composite strategy for making graphs (see below), and that works. My only problem with it is that is doesn't shrink very nice, and the construction can probably also be improved. Is there any documentation or examples available for creating such advanced strategies? Is there any advice from strategy experts here on how to approach the building and shrinking? And how would I go about testing my strategy? Once finished I intend to publish it under hypothesis-networkx, but that's still some distance in the future.

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  graph_type=nx.Graph, 
                  add_edges_untill=lambda x: True, self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = sorted(draw(st.sets(node_keys, min_size=min_nodes, max_size=max_nodes)))
    for node in nodes:
        g.add_node(node, **draw(node_data))

    while g and not add_edges_untill(g)):
        n1 = draw(st.sampled_from(nodes))
        n2 = draw(st.sampled_from(nodes))
        if not self_loops:
            st.assume(n1 != n2)
        g.add_edge(n1, n2, **draw(edge_data))

    return g
@Zac-HD Zac-HD added the question not sure it's a bug? questions welcome label Jul 4, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Jul 4, 2018

Hi Peter - it's great to hear that Hypothesis has been useful (or at least found bugs 😉) for you!

Unfortunately the answers to pretty much all of your questions are "it depends" - but I can start with some things to try, and happy to keep discussing especially if you can tell me more about what you want to achieve and what's currently happening.

  • The node_keys may not be orderable (if someone e.g. mixed ints and strings) - instead of sorted(draw(st.sets(...))), I'd use draw(st.lists(..., unique=True)).
  • Shrink quality is always tricky, but I can offer some general advice - keeping draws of related data together usually helps. I'd experiment with getting a list of node keys and attributes at the same time, with draw(st.lists(st.tuples(node_keys, node_data), ..., unique_by=lambda t: t[0])).
  • The trickiest bit to get working as you want is probably the edges, and I don't understand what you want well enough to give useful advice there, sorry 😞.
  • General tips on edges though, unique-by-construction is usually more efficient than filtering - unlikely to be worth more complex code to distinguish n1 and n2, but it might be useful to exclude generating the same edge twice to improve shrinking quality for small examples.

So that's a pile of things to try which may or may not help you - let me know!

@Zac-HD Zac-HD closed this as completed Jul 4, 2018
@DRMacIver
Copy link
Member

DRMacIver commented Jul 4, 2018

My only problem with it is that is doesn't shrink very nice,

Could you give some examples of it shrinking badly?

The general Hypothesis philosophy is that Shrinking Is Magic (TM). The underlying model is supposed to handle shrinking for you without you having to worry about it too much. Obviously this is more true in theory than in practice! (Though it's true in practice surprisingly often). As such we're always interested in examples of bad shrinking.

That being said, I don't currently understand where your strategy actually ever stops adding edges. By default isn't this an infinite loop? Edit: Sorry I misunderstood the condition. Instead my question is... why does this only add one edge by default?

In terms of more advanced features... composite is pretty much it. There are a few internal things we could expose that might be helpful, but composite is pretty close to the actual underlying model of how data is generated in Hypothesis with some of the book-keeping handled for you.

@pckroon
Copy link
Contributor Author

pckroon commented Jul 4, 2018

Thanks for the replies :)

Could you give some examples of it shrinking badly?

I have some cases where the actual problem is in the data of one of the nodes, but it also gives me 10 other nodes (with edges) that are actually irrelevant to the issue. Or a case where more nodes could be removed. Do you want the exact details?

That being said, I don't currently understand where your strategy actually ever stops adding edges. By default isn't this an infinite loop?

Correct, my bad. In practice I always give it a function that returns True once the graph is connected.

I'll play around with the tips from Zac; I'll keep you posted.

@DRMacIver
Copy link
Member

Correct, my bad. In practice I always give it a function that returns True once the graph is connected.

Ah, that would be the problem.

Basically what's happening here is that when Hypothesis tries to remove a node during shrinking, this changes the edges you add later - they might point to different nodes or get discarded as invalid because they now point past the end of the node list, etc (basically an edge is being represented as a pair of indices into the node list). This means that trying to remove a node is often going to result in the graph becoming disconnected, which results in Hypothesis trying to draw more edges to complete it.

In general this kind of "generate stuff until some condition is met" approach is unlikely to shrink well, and you're better off trying to construct the object so that the condition is always satisfied as you go. I'll have a think about how one might do that well for connected graphs - maybe iterate over the node list and add an edge between each node and one sampled from the previous nodes? That will guarantee that there is a single connected component, and you can then add a bunch of edges in after. (I can't promise this will shrink well either, but it has a better chance of working)

@pckroon
Copy link
Contributor Author

pckroon commented Jul 4, 2018

Hmmm, tough cookie. Thanks for your insights.

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

Within the current paradigm, every time I add a node, I can/should connect it to an already existing node? I'm not sure this will fix the actual problem: whenever you remove an arbitrary node you run the risk of ending up with a disconnected graph. The only alternative I see now is to assume by graph is connected, but I fear generation will suffer over this. I'll give it a whirl, and see what comes out.

For comparison, here's what I came up with based on Zac's comments. It behaves a little better it seems, but still not quite optimal.

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  min_edges=0, max_edges=None,
                  graph_type=nx.Graph,
                  add_edges_until=nx.is_connected,
                  self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
                          max_size=max_nodes, unique_by=lambda t: t[0]))
    for node, data in nodes:
        g.add_node(node, **data)

    if max_edges is None:
        max_edges = len(nodes) * (len(nodes) - 1)

    node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
                         min_size=2, max_size=2,
                         unique=not self_loops)
    edge = st.tuples(node_pair, edge_data)
    edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
                          unique_by=lambda t: tuple(t[0])))

    for (n1, n2), data in edges:
        g.add_edge(n1, n2, **data)
    while g and not add_edges_until(g):
        (n1, n2), data = draw(edge)
        g.add_edge(n1, n2, **data)

    return g

@DRMacIver
Copy link
Member

Within the current paradigm, every time I add a node, I can/should connect it to an already existing node? I'm not sure this will fix the actual problem: whenever you remove an arbitrary node you run the risk of ending up with a disconnected graph

Right, but removing a node doesn't just delete the edge pointing to it, it causes that edge to point to something else (unless it's the last node, in which case it does get removed).

This definitely won't be perfect, but I think you might be pleasantly surprised by how well it shrinks. Here's some simplified code I put together to check out how well it works and it certainly seems to do all the basics right:

@st.composite
def connected_graphs(draw):
    g = networkx.Graph()
    nodes = draw(st.lists(st.integers(), unique=True))

    for n in nodes:
        g.add_node(n)

    for i in range(1, len(nodes)):
        j = draw(st.integers(0, i - 1))
        g.add_edge(nodes[i], nodes[j])

    for n1, n2 in draw(
        st.lists(st.tuples(st.sampled_from(nodes), st.sampled_from(nodes)))
    ):
        g.add_edge(n1, n2)

    return g

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

It's a fairly fundamental design feature of Hypothesis that you can't do that I'm afraid. The problem is that as soon as you introduce that sort of custom shrinking, everything fails to compose properly: In Hypothesis you only have to specify how to be able to construct an object, so things like @composite can work nicely by just chaining construction together, but as soon as you try to add custom shrinking you need to be able to specify how to deconstruct it, which can't be chained through the definition in the same way.

@pckroon
Copy link
Contributor Author

pckroon commented Jul 5, 2018

Thanks for your input, I'll definitely give it a try!

This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours).

It's a fairly fundamental design feature of Hypothesis that you can't do that I'm afraid. The problem is that as soon as you introduce that sort of custom shrinking, everything fails to compose properly: In Hypothesis you only have to specify how to be able to construct an object, so things like @composite can work nicely by just chaining construction together, but as soon as you try to add custom shrinking you need to be able to specify how to deconstruct it, which can't be chained through the definition in the same way.

And this is exactly why I needed your expert insights :) I don't think I grasp all the subtleties, so I'll take your word for it.

@pckroon
Copy link
Contributor Author

pckroon commented Jul 10, 2018

To wrap this thread up. I ended up with the following. It has a few more bells and whistles than the code @DRMacIver provided, but it does essentially the same. I'll probably clean it a bit, and create a hypothesis-networkx somewhere. Thanks again for your invaluable help!

@st.composite
def graph_builder(draw,
                  node_data=st.fixed_dictionaries({}),
                  edge_data=st.fixed_dictionaries({}),
                  node_keys=st.integers(),
                  min_nodes=0, max_nodes=50,
                  min_edges=0, max_edges=None,
                  graph_type=nx.Graph,
                  self_loops=False):
    if min_nodes < 0:
        raise ValueError('min_nodes can not be negative')

    g = graph_type()
    nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
                          max_size=max_nodes, unique_by=lambda t: t[0]))
    g.add_node(nodes[0][0], **nodes[0][1])
    for node, data in nodes[1:]:
        edge_to = draw(st.sampled_from(sorted(g.nodes)))
        g.add_node(node, **data)
        g.add_edge(node, edge_to, **draw(edge_data))

    if max_edges is None:
        max_edges = len(nodes) * (len(nodes) - 1)

    min_edges = max(0, min_edges - len(g.edges))
    max_edges = min(len(nodes) * (len(nodes) - 1), max_edges - len(g.edges))

    node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
                         min_size=2, max_size=2,
                         unique=not self_loops)
    edge = st.tuples(node_pair, edge_data)
    edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
                          unique_by=lambda t: tuple(t[0])))

    for (n1, n2), data in edges:
        g.add_edge(n1, n2, **data)

    return g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question not sure it's a bug? questions welcome
Projects
None yet
Development

No branches or pull requests

3 participants