-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make testserver multi-node more robust. #148
Conversation
RichardJCai
commented
Sep 21, 2022
- Fix race condition in pollListeningURLFile.
- Add Opts for init node timeout and poll listening url.
- InitTimeoutOpt, PollListenURLTimeoutOpt
- Parallelize restart test
a98fbfc
to
fc0e44c
Compare
- Fix race condition in pollListeningURLFile. - Add Opts for init node timeout and poll listening url. - InitTimeoutOpt, PollListenURLTimeoutOpt - Parallelize restart test
fc0e44c
to
dd2bfb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can test the added arguments? Maybe make a testserver configuration we know will timeout on init and time it to make sure the option is propogated properly?
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)
testserver/testserver.go
line 814 at r1 (raw file):
ts.mu.RLock() nodeDir := fmt.Sprintf("%s%d", ts.baseDir, i)
Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?
testserver/testserver_test.go
line 463 at r1 (raw file):
for j := 0; j < 3; j++ { for { port, err := getFreePort()
This is a very edge case, but this could race if another process is attempting to get a port from:0
at the same time. If that process does a TCP listen on :0
between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.
I think if you just feed the testserver :0
as the port to listen on, it'll start up guaranteeing port uniqueness, and you won't have to do any of this allocation.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pawalt, @rafiss, and @ZhouXing19)
testserver/testserver.go
line 814 at r1 (raw file):
Previously, pawalt (Peyton Walters) wrote…
Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?
They're separate directories, not nested. Ie the base one looks like cockroach-testserver123
and the node directories are cockroach-testserver1230
, cockroach-testserver1231
...
testserver/testserver_test.go
line 463 at r1 (raw file):
This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.
Yeah I thought about using :0
but we need to know what address to feed into the --join
flag which includes the port. Pretty annoying problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Your call on testing the new arguments. Functionality looks pretty obvious, but never hurts to have some tests.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)
testserver/testserver_test.go
line 463 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.
Yeah I thought about using
:0
but we need to know what address to feed into the--join
flag which includes the port. Pretty annoying problem here.
Hmmm that is annoying. Thanks for the clarification.
I'm allergic to adding tests for timeouts. Jk I'll add a simple one. |
Actually I'm gonna merge this as is, adding a testhook for timeout is somewhat annoying and I'm gonna try to get the testserver into CRDB before I leave. |