-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to test gateway opening/closing in Connect #14135
Conversation
Open() and Close() used to not return any error and Open() used to start the gateway in a goroutine, making it rather hard to write tests for it. This commit makes it so that Open() and Close() return errors and Open() blocks. Adjustments have been made to other places in lib/teleterm to account for that missing goroutine and returned errors.
While writing tests for the gateways, I was relying heavily on tests for the local proxy. I noticed that it starts the server but doesn't close it so I added an appropriate call to the cleanup function.
lib/teleterm/gateway/gateway_test.go
Outdated
wait := make(chan error) | ||
|
||
go func() { | ||
err := gateway.Open() | ||
wait <- err | ||
}() | ||
|
||
defer func() { | ||
// Make sure Open() is called. | ||
time.Sleep(time.Millisecond * 200) | ||
|
||
err := gateway.Close() | ||
require.NoError(t, err) | ||
require.NoError(t, <-wait) | ||
}() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already talked with Zac about this pattern under #14074 (comment). time.Sleep
is bad but I don't know what would be a better approach with this API. Maybe I should just reword this comment so that it doesn't suggest that sleeping actually makes sure that Open()
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to block until Open
is called then perhaps Gateway
should have a chan struct{}
field that gets closed before calling localProxy.Start
. From what I'm seeing, there's no real threshold in the middle of (*Gateway).Open
or (*LocalProxy).Start
in which the outer behavior changes, anyway - inbound connections get buffered as soon as the listener is opened in gateway.New
, even before Open
, and will just be processed by the accept loop in LocalProxy
whenever it gets the chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidenote: (*Gateway).Open
and (*LocalProxy).Start
should probably both be called Serve
or something along those lines, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second sidenote: why not just net.Dial
the address to confirm that the other side has actually accepted the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@espadolini Oh that's a good idea, would something like what I pushed in 1efd5ef work? The APIs in the net package are one level lower than what I'm used to.
sidenote:
(*Gateway).Open
and(*LocalProxy).Start
should probably both be calledServe
or something along those lines, IMO
I don't really want to rewrite ALPN proxy code right now but I can certainly take care of Gateway.Open
.
lib/teleterm/gateway/gateway_test.go
Outdated
wait := make(chan error) | ||
|
||
go func() { | ||
err := gateway.Open() | ||
wait <- err | ||
}() | ||
|
||
defer func() { | ||
// Make sure Open() is called. | ||
time.Sleep(time.Millisecond * 200) | ||
|
||
err := gateway.Close() | ||
require.NoError(t, err) | ||
require.NoError(t, <-wait) | ||
}() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to block until Open
is called then perhaps Gateway
should have a chan struct{}
field that gets closed before calling localProxy.Start
. From what I'm seeing, there's no real threshold in the middle of (*Gateway).Open
or (*LocalProxy).Start
in which the outer behavior changes, anyway - inbound connections get buffered as soon as the listener is opened in gateway.New
, even before Open
, and will just be processed by the accept loop in LocalProxy
whenever it gets the chance.
lib/teleterm/gateway/gateway_test.go
Outdated
wait := make(chan error) | ||
|
||
go func() { | ||
err := gateway.Open() | ||
wait <- err | ||
}() | ||
|
||
defer func() { | ||
// Make sure Open() is called. | ||
time.Sleep(time.Millisecond * 200) | ||
|
||
err := gateway.Close() | ||
require.NoError(t, err) | ||
require.NoError(t, <-wait) | ||
}() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidenote: (*Gateway).Open
and (*LocalProxy).Start
should probably both be called Serve
or something along those lines, IMO
lib/teleterm/gateway/gateway_test.go
Outdated
wait := make(chan error) | ||
|
||
go func() { | ||
err := gateway.Open() | ||
wait <- err | ||
}() | ||
|
||
defer func() { | ||
// Make sure Open() is called. | ||
time.Sleep(time.Millisecond * 200) | ||
|
||
err := gateway.Close() | ||
require.NoError(t, err) | ||
require.NoError(t, <-wait) | ||
}() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second sidenote: why not just net.Dial
the address to confirm that the other side has actually accepted the connection?
lib/teleterm/gateway/gateway_test.go
Outdated
_, err = net.DialTimeout("tcp", gatewayAddress, time.Second*1) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, err = net.DialTimeout("tcp", gatewayAddress, time.Second*1) | |
require.NoError(t, err) | |
c, err = net.DialTimeout("tcp", gatewayAddress, time.Second*1) | |
require.NoError(t, err) | |
t.Cleanup(func() { c.Close() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM granted that I'm not familiar with this part of the codebase.
I notice that with these changes, Serve()
(previously Open()
) is always called in a go func() {}
block, so making it block rather than launch a goroutine itself is immaterial. There's precedent for both designs, your call which you think is best.
The change to make teleport/lib/teleterm/gateway/gateway.go Lines 112 to 122 in 3c1a6b6
And it looks like most stuff that Connect uses behaves this way (ALPN proxy, gRPC server) so it made sense for me to make Gateway behave like this as well. I didn't know there was a precedent for |
lib/teleterm/gateway/gateway_test.go
Outdated
conn, err := net.DialTimeout("tcp", gatewayAddress, time.Second*1) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@espadolini One thing that worries me is that this check fails only if I don't start the listener on that address. It doesn't fail if I never actually start the local proxy or if I start it after 1.5 second.
It turns out that localProxy.Start()
, unlike gRPC's Server.Serve()
, doesn't error if it's called after Close()
.
So the test not failing if Serve()
is called after Close()
is okay I guess. But why doesn't the test fail if never start the local proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to adapt this to teleterm_test.go
to replace time.Sleep
with Dial
but it seems that I misunderstood how Dial
works. It doesn't actually wait for the other side to accept the connection since, as you said, the listener buffers those connections before they're accepted by the server.
So I cannot really use Dial
there to make sure the server is accepting connections.
Diff of teleterm_test.go
diff --git a/lib/teleterm/teleterm.go b/lib/teleterm/teleterm.go
index b5356fdc9..123a5be95 100644
--- a/lib/teleterm/teleterm.go
+++ b/lib/teleterm/teleterm.go
@@ -16,8 +16,10 @@ package teleterm
import (
"context"
+ "fmt"
"os"
"os/signal"
+ "time"
"github.com/gravitational/teleport/lib/teleterm/apiserver"
"github.com/gravitational/teleport/lib/teleterm/clusters"
@@ -60,7 +62,10 @@ func Start(ctx context.Context, cfg Config) error {
serverAPIWait := make(chan error)
go func() {
+ time.Sleep(time.Second * 2)
+ fmt.Println("apiServer.Serve()")
err := apiServer.Serve()
+ fmt.Println("apiServer.Serve() returned")
serverAPIWait <- err
}()
diff --git a/lib/teleterm/teleterm_test.go b/lib/teleterm/teleterm_test.go
index 51be1b619..649bac66f 100644
--- a/lib/teleterm/teleterm_test.go
+++ b/lib/teleterm/teleterm_test.go
@@ -16,7 +16,11 @@ package teleterm_test
import (
"context"
+ "errors"
"fmt"
+ "net"
+ "os"
+ "path/filepath"
"testing"
"time"
@@ -27,27 +31,38 @@ import (
func TestStart(t *testing.T) {
homeDir := t.TempDir()
+ sockPath := filepath.Join(homeDir, "teleterm.sock")
+ addr := fmt.Sprintf("unix://%v", sockPath)
cfg := teleterm.Config{
- Addr: fmt.Sprintf("unix://%v/teleterm.sock", homeDir),
+ Addr: addr,
HomeDir: fmt.Sprintf("%v/", homeDir),
}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
- wait := make(chan error)
+ serveErr := make(chan error)
go func() {
err := teleterm.Start(ctx, cfg)
- wait <- err
+ serveErr <- err
}()
- defer func() {
- // Make sure Start() is called.
- time.Sleep(time.Millisecond * 500)
+ require.Eventually(t, func() bool {
+ _, err := os.Stat(sockPath)
- // Stop the server.
- cancel()
- require.NoError(t, <-wait)
- }()
+ return !errors.Is(err, os.ErrNotExist)
+ }, time.Millisecond*500, time.Millisecond*50)
+
+ // Dial to make sure Start was called.
+ fmt.Println("net.DialTimeout()")
+ conn, err := net.DialTimeout("unix", sockPath, time.Second*1)
+ fmt.Println("net.DialTimeout() returned")
+ require.NoError(t, err)
+ // Stop the server.
+ fmt.Println("cancel()")
+ cancel()
+ conn.Close()
+ fmt.Println("waiting for serveErr")
+ require.NoError(t, <-serveErr)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would conn.Read()
block until the server accepts the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think conn.Read()
does the trick. I just pushed 3e34e02 which fails if the connection isn't accepted within a second.
I was also able to improve teleterm tests in a similar way, I'll push those changes in a separate PR.
@ravicious See the table below for backport results.
|
Yeah you're right, I glossed over some of the other implementations, but now that I look more closely its standard practice to block. |
Edoardo asked about it some time ago. I forgot to add the explanation to the code. #14135 (comment)
* daemon: Put gateway-related methods next to each other * Remove unused fields from daemon.Config * Make Config a private field instead of embedding it * Add tests for gateway CRUD * Remove unnecessary ctx and error from daemon.Service gateway methods * Refactor daemon.Service.gateways to a hash map * Add comment explaining error handling in removeGateway Edoardo asked about it some time ago. I forgot to add the explanation to the code. #14135 (comment) * Do not return pointers from ListGateways() * Remove FindGateway, fix lock issues in RestartGateway() and RemoveGateway()
Edoardo asked about it some time ago. I forgot to add the explanation to the code. #14135 (comment)
* daemon: Put gateway-related methods next to each other * Remove unused fields from daemon.Config * Make Config a private field instead of embedding it * Add tests for gateway CRUD * Remove unnecessary ctx and error from daemon.Service gateway methods * Refactor daemon.Service.gateways to a hash map * Add comment explaining error handling in removeGateway Edoardo asked about it some time ago. I forgot to add the explanation to the code. #14135 (comment) * Fix license * Rename ClusterResolver to Resolver * Do not return pointers from ListGateways() * Remove FindGateway, fix lock issues in RestartGateway() and RemoveGateway()
Open()
andClose()
used to not return any error andOpen()
used to start the gateway in a goroutine, making it rather hard to write tests for it.This commit makes it so that
Open()
andClose()
return errors andOpen()
blocks.Adjustments have been made to other places in lib/teleterm to account for that missing goroutine and returned errors.