-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Network test sync logic #2748
refactor: Network test sync logic #2748
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2748 +/- ##
===========================================
+ Coverage 79.36% 79.38% +0.02%
===========================================
Files 323 323
Lines 24661 24691 +30
===========================================
+ Hits 19572 19600 +28
+ Misses 3692 3686 -6
- Partials 1397 1405 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 18 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
tests/integration/p2p.go
Outdated
for i := 0; i < len(s.nodes); i++ { | ||
if nodeID.HasValue() && nodeID.Value() != i { | ||
continue // node is not selected | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The documentation says waits for all selected nodes
but the function only receives one. You can probably do away with the loop and just get the node directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nodeID
has no value it will wait for all nodes.
tests/integration/p2p.go
Outdated
// update the expected document heads of connected nodes | ||
for id := range s.nodeConnections[i] { | ||
if _, ok := s.actualDocHeads[id][evt.DocID]; ok { | ||
s.expectedDocHeads[id][evt.DocID] = evt.Cid | ||
} | ||
} | ||
// update the expected document heads of replicator sources | ||
for id := range s.nodeReplicatorTargets[i] { | ||
if _, ok := s.actualDocHeads[id][evt.DocID]; ok { | ||
s.expectedDocHeads[id][evt.DocID] = evt.Cid | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thougth: It's odd that we would be updating the expected heads only if already present in the actual heads. Shouldn't it be updated regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the code is correct based on what we discussed regarding replicator target nodes sending updates of a document back to the replicator source node.
tests/integration/state.go
Outdated
// eventSubs is a list of all event subscriptions | ||
eventSubs []*event.Subscription | ||
// nodeUpdateSubs is a list of all update event subscriptions | ||
nodeUpdateSubs []*event.Subscription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Can we wrap these up in a dedicated type? They are consuming a disproportionately large amount of attention on the state struct compared to their visibility needs.
something like:
type p2pState struct {
nodeConnections []map[int]struct{}
...
}
type state struct {
...
p2pState p2pState
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/integration/utils2.go
Outdated
@@ -1205,6 +1205,10 @@ func createDoc( | |||
s.documents = append(s.documents, make([][]*client.Document, action.CollectionID-len(s.documents)+1)...) | |||
} | |||
s.documents[action.CollectionID] = append(s.documents[action.CollectionID], doc) | |||
|
|||
if action.ExpectedError == "" { | |||
waitForUpdateEvents(s, action.NodeID, action.CollectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: This is a significant change in the tests - and one that looks like it removes our ability to test concurrent P2P stuff (something that we very much want-to/need-to/did test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name is a bit misleading, but Update
events are published immediately after a document is saved and shouldn't really change how we test things concurrently.
We could also merge waitForUpdateEvents
into waitForSync
since the events are buffered if you would prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean - it is not waiting for the sync, just the event publishing to the current node's event queue?
Why are you calling it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean - it is not waiting for the sync, just the event publishing to the current node's event queue?
Yes
Why are you calling it then?
Its a convenient way to track which doc heads are expected to be merged on remote nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a convenient way to track which doc heads are expected to be merged on remote nodes.
Can you please document that really well? Our integration tests are already super magic and far more complex than most, and that complexity is not really tested. The P2P space is particularly complicated and prone to annoying, late-to-discover, flaky bugs and it is well worth over-explaining things to save future hassle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more documentation about how the document head are tracked. It should also be much easier to follow since its all moved to an events.go
file. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look really good, thanks Keenan. Has got me thinking about test Cid substitution too, in the future we might want to tweak state.nodeP2P.DocHeads
to allow this.
@@ -1355,6 +1359,10 @@ func deleteDoc( | |||
} | |||
|
|||
assertExpectedErrorRaised(s.t, s.testCase.Description, action.ExpectedError, expectedErrorRaised) | |||
|
|||
if action.ExpectedError == "" { | |||
waitForUpdateEvents(s, action.NodeID, action.CollectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why is this only done using create-save, and not create-gql and the other create? And why not on delete?
thought: This looks quite easy to mess up and forget, and it missing it might not create an obvious and immediate failure (just flaky tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 similar thought, for here and other instances of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly the createDoc
function calls the create-save
and create-gql
functions so they both should be covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😁 My bad, the github presentation of the diff helped me miss-read where this line was... Ignore this thread :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not finished reviewing, but want to block the merge until my concerns around the apparent loss of concurrency have been resolved. The code looks simpler, but I think we may have lost the ability to test some important things.
todo: Please make sure you run the P2P tests a lot (IIRC Fred/I had to run them 100 or so times to reproduce the last solved issues) before merging this, we had flakiness problems with this before and they cost a lot of time. Maybe involve a few colleagues with different machines, as in the past more constrained machines had different flaky bugs compared to faster ones. |
tests/integration/state.go
Outdated
// nodeReplicatorSources contains all active replicators. | ||
// | ||
// The index of the slice is the source node id. The map key is the target node id. | ||
nodeReplicatorSources []map[int]struct{} | ||
|
||
// nodeReplicatorTargets contains all active replicators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: documentation suggests that nodeReplicatorSources
and nodeReplicatorTargets
are the same, but they are not. Please make documentation reflect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/integration/utils2.go
Outdated
@@ -781,6 +766,10 @@ func refreshCollections( | |||
) { | |||
s.collections = make([][]client.Collection, len(s.nodes)) | |||
|
|||
for i := len(s.nodePeerCollections); i < len(s.collectionNames); i++ { | |||
s.nodePeerCollections = append(s.nodePeerCollections, make(map[int]struct{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This looks like it will break when collections are removed/deactivated in a node (using patchCollection)? Users currently can do this, although we likely have no P2P tests for it atm.
There may be other test things that currently break when doing that, but we should avoid adding new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be easy to handle. Just need to find which collections to deactivate from the p2p state.
- handle patchCollection
switch action := tc.(type) { | ||
case ConnectPeers: | ||
// Give the nodes a chance to connect to each other and learn about each other's subscribed topics. | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: I forgot we were currently relying on a sleep - thank you very much for removing this!
tests/integration/p2p.go
Outdated
@@ -479,7 +337,61 @@ func getAllP2PCollections( | |||
assert.Equal(s.t, expectedCollections, cols) | |||
} | |||
|
|||
// waitForSync waits for all given wait channels to receive an item signaling completion. | |||
// waitForUpdateEvents waits for all selected nodes to publish an update event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Building on the other related documentation request, it might be worth highlighting that you mean publish to the local event queue, not to the pub/sub net or some other P2P related network thing.
tests/integration/events.go
Outdated
reps, err := s.nodes[nodeID].GetAllReplicators(s.ctx) | ||
require.NoError(s.t, err) | ||
|
||
p2pTopicEvents := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It looks like p2pTopicEvents
is only ever 0 or 1, why is it not a boolean? Is this a bug/wip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to bool
tests/integration/utils2.go
Outdated
@@ -1205,6 +1205,10 @@ func createDoc( | |||
s.documents = append(s.documents, make([][]*client.Document, action.CollectionID-len(s.documents)+1)...) | |||
} | |||
s.documents[action.CollectionID] = append(s.documents[action.CollectionID], doc) | |||
|
|||
if action.ExpectedError == "" { | |||
waitForUpdateEvents(s, action.NodeID, action.CollectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look really good, thanks Keenan. Has got me thinking about test Cid substitution too, in the future we might want to tweak state.nodeP2P.DocHeads
to allow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a really nice improvement, thanks Keenan :)
@@ -1355,6 +1359,10 @@ func deleteDoc( | |||
} | |||
|
|||
assertExpectedErrorRaised(s.t, s.testCase.Description, action.ExpectedError, expectedErrorRaised) | |||
|
|||
if action.ExpectedError == "" { | |||
waitForUpdateEvents(s, action.NodeID, action.CollectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😁 My bad, the github presentation of the diff helped me miss-read where this line was... Ignore this thread :)
tests/integration/state.go
Outdated
@@ -24,6 +25,86 @@ import ( | |||
"github.com/sourcenetwork/defradb/tests/clients" | |||
) | |||
|
|||
// p2pState contains all p2p related testing test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: related testing state.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Core looks now much more intuitive. I just a have a question to clarify.
tests/integration/utils2.go
Outdated
@@ -342,7 +341,7 @@ func performAction( | |||
assertClientIntrospectionResults(s, action) | |||
|
|||
case WaitForSync: | |||
waitForSync(s, action) | |||
waitForMergeEvents(s, action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: it's great that now waiting is done on the same thread. Earlier it used to call a goroutine for every ConnectPeers
action and had subtle shortcomings.
@@ -1457,6 +1424,10 @@ func updateDoc( | |||
} | |||
|
|||
assertExpectedErrorRaised(s.t, s.testCase.Description, action.ExpectedError, expectedErrorRaised) | |||
|
|||
if action.ExpectedError == "" { | |||
waitForUpdateEvents(s, action.NodeID, action.CollectionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
questions: this looks like unsolicited sync. Why is it unconditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should always be an update event published to the local event bus if a document is updated or created.
tests/integration/test_case.go
Outdated
DontSync bool | ||
// Skip waiting for an update event. This should only be used for | ||
// tests that do not correctly publish an update event. | ||
SkipUpdateEvent bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: why is it synced by default now? That resembles sequentially consistent memory order instead of relaxed one that we used before.
I think, if we want to sync, we should request it explicitly. Unsynced actions are closer to realistic behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: I think this is a misleading name and comment, IIRC waitForUpdateEvents
does not wait for the P2P sync, but only the update event to be published to the event bus. If that is correct, @nasdf can you please rework the property name and documentation to make that clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this then if not waiting: <-s.nodeEvents[i].update.Message():
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is just the event bus, it is written to synchronously by the database when mutating documents - the P2P code also subscribes to this bus - so 'waiting' to the bus should never really block, and the P2P stuff hasn't even been sent over the network at that point.
If I understand it correctly. I made the same comment as you did earlier for waitForUpdateEvents
- I think Keenan improved the documentation and code location since I commented, but given that it is still tripping up reviewers, that too should probably be improved further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Stuff 👍
Relevant issue(s)
Resolves #2747
Description
This PR improves the network sync test logic to allow for further improvements to the network implementation.
Tasks
How has this been tested?
make test
Specify the platform(s) on which this was tested: