-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169
Conversation
I was wondering should we remove PS: Can somebody restart Travis CI? It failed with a strange error... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work here, thanks. I have some comments and suggestions inline.
f6aef32
to
392d4cd
Compare
Updated the commits with all of your suggestions @joshuagl except on one of your comments. |
I'm not 100% sure -- maybe successful binding is enough to guarantee that there is a response when the port is tried? It seems reasonable but I would not bet retirement money on that being true on every platform. Maybe don't remove it now, but add a calendar notification a few weeks after merging. Then we can check CI logs to see whether the combination helped and can remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise looks good, left comments about slow retrieval mostly...
Commenting to write down results from chat: I think there is probably a |
Agreed. Particularly as slow retrieval is currently not part of the spec and could be removed from the reference implementation (see #1156). |
4f8a15b
to
d30bb04
Compare
I addressed all of @jku feedbacks. I rebased and changed the following thinks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice set of changes, thanks Martin!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not look at the new commit yet -- it sounds good but also looks like a lot of unrelated changes that are not the easiest to review... If you want to keep it in this PR we can do that but it could be a separate one as well
Yea, maybe you are right. |
This mechanism is useful for the tests which are using server subprocesses and if any of the ports they are trying to connect on is already taken. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Remove the test with mode 2 ('mode_2': During the download process, the server blocks the download by sending just several characters every few seconds.) from test_slow_retrieval. This test is marked as "expected failure" with the purpose of rewriting it one day, but slow retrievals have been removed from the specification and soon it will be removed from the tuf reference implementation as a whole. That means that the chances of making this test useful are close to 0 if not none. The other test (with mode 1) in test_slow_retrieval is not removed. For reference: - theupdateframework/specification#111 - theupdateframework#1156 Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Until now, slow_retrieval.py couldn't use the TestServerProcess class port generation and retry mechanism from utils.py because we were using httpd.handle_request() which handles only ONE request. Then, what happened is that when we use wait_for_server() to make a test connection and verify that the server is up, the slow_retrieval server handles that connection (which it accepts as a request) and exits. We avoided that use-case by passing timeout = 0 and avoiding calling wait_for_server() on this special value. Now, when we use httpd.serve_forever() this problem is resolved and no longer we need to make those checks. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
The port argument for __init__ function in TestServerProcess is no longer needed because none of the tests is using it. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
We want to make sure we generate the server ports only in TestServerProcess class from utils.py because it has a retry mechanism on server startup failure which makes it more reliable. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
We added a retry mechanism on server startup failure which generates a new port and tries to start the server again until the server is started or the timeout has expired. We want to make sure that this mechanism is well tested as for usual cases same for corner cases e.g. when the server hangs on forever or just exits unexpectedly. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Now, after we can use wait_for_server and the retry mechanism of TestServerProcess in utils.py we no longer need to use sleep in this test file. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
d30bb04
to
2647399
Compare
I didn't want to rebase this time, but because I had to change the commit messages I had to. I found a lot of different small or bigger problems while addressing your comments @joshuagl and @jku. There are a lot of changes (for which I am sorry that I didn't submit them in the begging) which I will try to summarize:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few specific comments, but I have some general ones as well:
- I'm not 100% sold on the tests: the idea is good (TestServerProcess is complex enough that it should ideally be tested) but testing this properly is not trivial and the implementation should not mean actual code (e.g. simple_server.py) gets more complex because of it
- The server startup functionality still smells complex: I think this is the code that will make or break the CI reliability, and it should be both correct and simple enough for everyone to see that it's correct. Not going to details here: let's talk it through later
class ImproperNumberOfArguments(Exception): | ||
"""Raised when the number of argumnets is wrong.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add exceptions if we don't intend to ever handle them (also, server scripts importing the utils.py that starts the server scripts... this does not look right)
# collecting the logs per test. | ||
self.__server_process = subprocess.Popen(command, | ||
stdout=self.__temp_log_file, stderr=subprocess.STDOUT, cwd=popen_cwd) | ||
timeout = timeout - (time.time() - start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not do what it should :)
|
||
# Used for tests in tests/test_utils.py | ||
if len(sys.argv) > 2: | ||
if sys.argv[2] == "stop": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, this script already uses argv[2] for something else. Second: I'm not 100% convinced these new features should be here: If you need this kind of "breaking server", add a new script instead
def is_process_running(self): | ||
"""Returns a boolean value if the server process is currently running.""" | ||
|
||
return True if self.__server_process.returncode is None else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after testing and checking docs I found out that returncode
does not actually work if poll() or wait() has not been called: so we should just use poll() here instead.
I may have been the one who said "use returncode", sorry about that
While addressing the comments made by @jku I read somewhere that we can use a simpler way to generate unused ports as showed here: https://docs.python.org/3/library/socketserver.html#asynchronous-mixins
The same method applies for This means that instead of us generating a port, verifying that it's not busy, and using a retry mechanism if the port is busy we can simply start the server like that
and arbitrary unused port will be chosen for us. I have implemented that new mechnaism in a different branch. I will close this pr because there is no longer sense for us to generate a new port and validate it as I did and we discussed in this pr and I will create a new pr where I will push this new branch. |
Fixes #1124, Fixes #1111
Description of the changes being introduced by the pull request:
Adds a retry mechanism on server startup failure. This mechanism is useful for the tests
which are using server subprocesses and if any of the ports they are trying to connect on is already taken.
The other changes in this pr make sure we are using only TestServerProcess for port generation and finally
I removed the
port
argument as a whole from the TestServerProcess constructor.Please verify and check that the pull request fulfills the following
requirements: