-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom strategies beyond composites #1381
Comments
Hi Peter - it's great to hear that Hypothesis has been useful (or at least found bugs 😉) for you! Unfortunately the answers to pretty much all of your questions are "it depends" - but I can start with some things to try, and happy to keep discussing especially if you can tell me more about what you want to achieve and what's currently happening.
So that's a pile of things to try which may or may not help you - let me know! |
Could you give some examples of it shrinking badly? The general Hypothesis philosophy is that Shrinking Is Magic (TM). The underlying model is supposed to handle shrinking for you without you having to worry about it too much. Obviously this is more true in theory than in practice! (Though it's true in practice surprisingly often). As such we're always interested in examples of bad shrinking.
In terms of more advanced features... composite is pretty much it. There are a few internal things we could expose that might be helpful, but composite is pretty close to the actual underlying model of how data is generated in Hypothesis with some of the book-keeping handled for you. |
Thanks for the replies :)
I have some cases where the actual problem is in the data of one of the nodes, but it also gives me 10 other nodes (with edges) that are actually irrelevant to the issue. Or a case where more nodes could be removed. Do you want the exact details?
Correct, my bad. In practice I always give it a function that returns True once the graph is connected. I'll play around with the tips from Zac; I'll keep you posted. |
Ah, that would be the problem. Basically what's happening here is that when Hypothesis tries to remove a node during shrinking, this changes the edges you add later - they might point to different nodes or get discarded as invalid because they now point past the end of the node list, etc (basically an edge is being represented as a pair of indices into the node list). This means that trying to remove a node is often going to result in the graph becoming disconnected, which results in Hypothesis trying to draw more edges to complete it. In general this kind of "generate stuff until some condition is met" approach is unlikely to shrink well, and you're better off trying to construct the object so that the condition is always satisfied as you go. I'll have a think about how one might do that well for connected graphs - maybe iterate over the node list and add an edge between each node and one sampled from the previous nodes? That will guarantee that there is a single connected component, and you can then add a bunch of edges in after. (I can't promise this will shrink well either, but it has a better chance of working) |
Hmmm, tough cookie. Thanks for your insights. This is one of the reasons I was hoping I could inherit from a Strategy class and define grow and shrink methods (or something), to have a bit more control on how the problem can be reduced (i.e. contract an edge, or if you remove a node add an edge between it's neighbours). Within the current paradigm, every time I add a node, I can/should connect it to an already existing node? I'm not sure this will fix the actual problem: whenever you remove an arbitrary node you run the risk of ending up with a disconnected graph. The only alternative I see now is to For comparison, here's what I came up with based on Zac's comments. It behaves a little better it seems, but still not quite optimal. @st.composite
def graph_builder(draw,
node_data=st.fixed_dictionaries({}),
edge_data=st.fixed_dictionaries({}),
node_keys=st.integers(),
min_nodes=0, max_nodes=50,
min_edges=0, max_edges=None,
graph_type=nx.Graph,
add_edges_until=nx.is_connected,
self_loops=False):
if min_nodes < 0:
raise ValueError('min_nodes can not be negative')
g = graph_type()
nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
max_size=max_nodes, unique_by=lambda t: t[0]))
for node, data in nodes:
g.add_node(node, **data)
if max_edges is None:
max_edges = len(nodes) * (len(nodes) - 1)
node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
min_size=2, max_size=2,
unique=not self_loops)
edge = st.tuples(node_pair, edge_data)
edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
unique_by=lambda t: tuple(t[0])))
for (n1, n2), data in edges:
g.add_edge(n1, n2, **data)
while g and not add_edges_until(g):
(n1, n2), data = draw(edge)
g.add_edge(n1, n2, **data)
return g |
Right, but removing a node doesn't just delete the edge pointing to it, it causes that edge to point to something else (unless it's the last node, in which case it does get removed). This definitely won't be perfect, but I think you might be pleasantly surprised by how well it shrinks. Here's some simplified code I put together to check out how well it works and it certainly seems to do all the basics right: @st.composite
def connected_graphs(draw):
g = networkx.Graph()
nodes = draw(st.lists(st.integers(), unique=True))
for n in nodes:
g.add_node(n)
for i in range(1, len(nodes)):
j = draw(st.integers(0, i - 1))
g.add_edge(nodes[i], nodes[j])
for n1, n2 in draw(
st.lists(st.tuples(st.sampled_from(nodes), st.sampled_from(nodes)))
):
g.add_edge(n1, n2)
return g
It's a fairly fundamental design feature of Hypothesis that you can't do that I'm afraid. The problem is that as soon as you introduce that sort of custom shrinking, everything fails to compose properly: In Hypothesis you only have to specify how to be able to construct an object, so things like |
Thanks for your input, I'll definitely give it a try!
And this is exactly why I needed your expert insights :) I don't think I grasp all the subtleties, so I'll take your word for it. |
To wrap this thread up. I ended up with the following. It has a few more bells and whistles than the code @DRMacIver provided, but it does essentially the same. I'll probably clean it a bit, and create a @st.composite
def graph_builder(draw,
node_data=st.fixed_dictionaries({}),
edge_data=st.fixed_dictionaries({}),
node_keys=st.integers(),
min_nodes=0, max_nodes=50,
min_edges=0, max_edges=None,
graph_type=nx.Graph,
self_loops=False):
if min_nodes < 0:
raise ValueError('min_nodes can not be negative')
g = graph_type()
nodes = draw(st.lists(st.tuples(node_keys, node_data), min_size=min_nodes,
max_size=max_nodes, unique_by=lambda t: t[0]))
g.add_node(nodes[0][0], **nodes[0][1])
for node, data in nodes[1:]:
edge_to = draw(st.sampled_from(sorted(g.nodes)))
g.add_node(node, **data)
g.add_edge(node, edge_to, **draw(edge_data))
if max_edges is None:
max_edges = len(nodes) * (len(nodes) - 1)
min_edges = max(0, min_edges - len(g.edges))
max_edges = min(len(nodes) * (len(nodes) - 1), max_edges - len(g.edges))
node_pair = st.lists(st.sampled_from([n[0] for n in nodes]),
min_size=2, max_size=2,
unique=not self_loops)
edge = st.tuples(node_pair, edge_data)
edges = draw(st.lists(edge, min_size=min_edges, max_size=max_edges,
unique_by=lambda t: tuple(t[0])))
for (n1, n2), data in edges:
g.add_edge(n1, n2, **data)
return g |
Hi all,
First off, thanks! Hypothesis has helped me find a lot of bugs in my code. I'm lagging behind in fixing them though (maybe a next tool? ;))
The code I have works on networkx graphs (https://networkx.github.io/documentation/stable/reference/classes/graph.html#networkx.Graph). I have made a composite strategy for making graphs (see below), and that works. My only problem with it is that is doesn't shrink very nice, and the construction can probably also be improved. Is there any documentation or examples available for creating such advanced strategies? Is there any advice from strategy experts here on how to approach the building and shrinking? And how would I go about testing my strategy? Once finished I intend to publish it under
hypothesis-networkx
, but that's still some distance in the future.The text was updated successfully, but these errors were encountered: