Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake faster joins device list tests by waiting for leave event #627

Merged
merged 4 commits into from
Mar 16, 2023

Conversation

squahtx
Copy link
Contributor

@squahtx squahtx commented Mar 14, 2023

Many of the faster joins test flakes are due to the homeserver under
test failing to contact Complement homeservers after they have been
torn down. When this happens, subsequent tests can fail if they use a
Complement homeserver that happens to have the same hostname:port as one
which the homeserver under test has previously marked as offline.

Wait for the homeserver under test to finish broadcasting its leave at
the end of the device list tests.

Signed-off-by: Sean Quah seanq@matrix.org


Builds on top of #626.
Reviewable commit by commit.

NB: you can tease out the flakes by forcing Complement to reuse ports whenever possible:

diff --git a/internal/federation/server.go b/internal/federation/server.go
index ed7974d..4c4ccbe 100644
--- a/internal/federation/server.go
+++ b/internal/federation/server.go
@@ -455,7 +455,10 @@ func (s *Server) Listen() (cancel func()) {
        var wg sync.WaitGroup
        wg.Add(1)

-       ln, err := net.Listen("tcp", ":0") //nolint
+       ln, err := net.Listen("tcp", ":63333") //nolint
+       if err != nil {
+               ln, err = net.Listen("tcp", ":63334") //nolint
+       }
        if err != nil {
                s.t.Fatalf("ListenFederationServer: net.Listen failed: %s", err)
        }

@squahtx squahtx requested review from a team and kegsay as code owners March 14, 2023 17:20
@squahtx
Copy link
Contributor Author

squahtx commented Mar 14, 2023

CI flaked due to the issue fixed by #628.

Sean Quah added 4 commits March 15, 2023 10:44
Some of the faster joins tests have two Complement homeservers in the
same room. We need both Complement homeservers to believe they are in
the room, so that we can wait to observe a leave for the test user as
part of test cleanup. The easiest way to do this is to have the second
Complement homeserver perform a real join.

Signed-off-by: Sean Quah <seanq@matrix.org>
This is useful for setting up tests with two Complement homeservers in
the same room by having one of them join off the other.

Signed-off-by: Sean Quah <seanq@matrix.org>
...so that we can wait for the homeserver under test to send us leave
events during test cleanup.

Signed-off-by: Sean Quah <seanq@matrix.org>
This depends on the second Complement homeserver believing it is joined
to the test room.

Signed-off-by: Sean Quah <seanq@matrix.org>
@squahtx squahtx force-pushed the squah/faster_room_joins_fix_test_flakes_2 branch from 632ac7b to 926a40f Compare March 15, 2023 10:44
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was painful to diagnose; thank you for persisting.

@@ -339,7 +339,7 @@ func (s *Server) MustCreateEvent(t *testing.T, room *ServerRoom, ev b.Event) *go

// MustJoinRoom will make the server send a make_join and a send_join to join a room
// It returns the resultant room.
func (s *Server) MustJoinRoom(t *testing.T, deployment *docker.Deployment, remoteServer gomatrixserverlib.ServerName, roomID string, userID string) *ServerRoom {
func (s *Server) MustJoinRoom(t *testing.T, deployment *docker.Deployment, remoteServer gomatrixserverlib.ServerName, roomID string, userID string, partialState ...bool) *ServerRoom {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is go's idiom for an optional argument? Feels a bit odd to allow an arbitrary length array/slice, but I guess it's easier than updating every call site?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure actually. Functional varargs (https://stackoverflow.com/a/26326418) is the pattern MustDoFunc uses. Here I didn't bother with them since there's only one optional argument.

This was linked to issues Mar 16, 2023
@DMRobertson
Copy link
Contributor

(I've optimistically marked this as closing a bunch of faster join device list updates tests---hope I correctly matched up tests to this issue)

@squahtx squahtx merged commit 646d35f into main Mar 16, 2023
@squahtx squahtx deleted the squah/faster_room_joins_fix_test_flakes_2 branch March 16, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants