Try to fix flaky tests by waiting for subscriptions & mesh to be ready #203

aarshkshah1992 · 2019-10-04T13:41:17Z

In spite of many repeated runs, I was unable to reproduce the flakiness. However, I can intuitively see that the mesh overlay is not guaranteed to be in a steady state if we just sleep & hope for the best.

Two main changes made to all the 3 tests mentioned in the issue are:

Poll for the incoming subscription messages on all peers to be processed so that PubSub.topics reflects the correct state
Poll for the overlay mesh to have ATLEAST Dlow peers rather than sleeping for 2 seconds & hoping the corresponding heartbeats get processed in the meantime

Let me know what you think. If this approach sounds reasonable, we can do something similar for the other tests.

aarshkshah1992 · 2019-10-04T13:57:45Z

gossipsub_test.go

@@ -23,6 +23,26 @@ func getGossipsubs(ctx context.Context, hs []host.Host, opts ...Option) []*PubSu
 	return psubs
 }

+func waitForMeshConstruction(topic string, psubs []*PubSub) {


Races with the heartbeat thread for GossipsubRouter.mesh[T]. Let me think of something.

yep, if your function is very computationally inexpensive you can generally use something along the lines of this:

done := make(chan resultType,1) p.eval <- func() { done <- myFn() } return <-done

Note: the channel closure is only going to be needed if your function needs to be synchronous. Also, you may want to add in select statements that check for an expired context if you want to reuse these functions a bunch.

aarshkshah1992 · 2019-10-04T13:59:17Z

gossipsub_test.go

+
+// wait for all incoming subscription messages to be processed
+// this method should ONLY be called if all connected peers subscribe to the same topic
+func waitForSubscriptionProcessing(topic string, psubs []*PubSub) {


Races with PubSub.handleIncomingRPC when it tries to update Pubsub.topics[T].

See above: PubSub is currently setup to basically have a few long running goroutines with event loops in them instead of using mutexes. There's an internal p.eval channel that will run arbitrary functions for you inside of the event loop so you don't run into races.

Aha... that is brilliant. Let me fix this.

aarshkshah1992 · 2019-10-04T14:19:38Z

gossipsub_test.go

@@ -704,7 +726,7 @@ func TestGossipsubGraftPruneRetry(t *testing.T) {
 }

 func TestGossipsubControlPiggyback(t *testing.T) {
-	t.Skip("travis regularly fails on this test")
+	//t.Skip("travis regularly fails on this test")


@aschmahmann I'm not being able to understand the last line of this test & of TestGossipsubGossip

// and wait for some gossip flushing time.Sleep(time.Second * 2)

Please can you explain what this is for ? Why is it important to flush the remaining gossip/control messages once the test is complete ?

Not really sure, perhaps it's just supposed to help close things down when running consecutive tests, even though that should be covered by cancelling the context 🤷‍♂.

Hopefully either @vyzo can shed some light or perhaps there's an answer to be found in the first gossipsub PR #67.

vyzo · 2020-03-23T10:32:41Z

closing as it doesn't fix the tests and needs quite a bit of work.

fix flaky tests by waiting for subscriptions & mesh to be ready

ff1334c

aarshkshah1992 commented Oct 4, 2019

View reviewed changes

fixed race conditions in test by using Pubsub.Eval

d288510

aarshkshah1992 changed the title ~~Fix flaky tests by waiting for subscriptions & mesh to be ready~~ Try to fix flaky tests by waiting for subscriptions & mesh to be ready Oct 6, 2019

aarshkshah1992 mentioned this pull request Oct 14, 2019

Fix flaky tests #202

Open

Stebalien requested a review from aschmahmann November 12, 2019 00:46

vyzo closed this Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix flaky tests by waiting for subscriptions & mesh to be ready #203

Try to fix flaky tests by waiting for subscriptions & mesh to be ready #203

aarshkshah1992 commented Oct 4, 2019

aarshkshah1992 Oct 4, 2019 •

edited

Loading

aschmahmann Oct 4, 2019

aarshkshah1992 Oct 4, 2019 •

edited

Loading

aschmahmann Oct 4, 2019

aarshkshah1992 Oct 4, 2019

aarshkshah1992 Oct 4, 2019

aschmahmann Oct 4, 2019 •

edited

Loading

vyzo commented Mar 23, 2020

Try to fix flaky tests by waiting for subscriptions & mesh to be ready #203

Try to fix flaky tests by waiting for subscriptions & mesh to be ready #203

Conversation

aarshkshah1992 commented Oct 4, 2019

aarshkshah1992 Oct 4, 2019 • edited Loading

Choose a reason for hiding this comment

aschmahmann Oct 4, 2019

Choose a reason for hiding this comment

aarshkshah1992 Oct 4, 2019 • edited Loading

Choose a reason for hiding this comment

aschmahmann Oct 4, 2019

Choose a reason for hiding this comment

aarshkshah1992 Oct 4, 2019

Choose a reason for hiding this comment

aarshkshah1992 Oct 4, 2019

Choose a reason for hiding this comment

aschmahmann Oct 4, 2019 • edited Loading

Choose a reason for hiding this comment

vyzo commented Mar 23, 2020

aarshkshah1992 Oct 4, 2019 •

edited

Loading

aarshkshah1992 Oct 4, 2019 •

edited

Loading

aschmahmann Oct 4, 2019 •

edited

Loading