Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169

MVrachev · 2020-10-08T11:34:30Z

Description of the changes being introduced by the pull request:

Adds a retry mechanism on server startup failure. This mechanism is useful for the tests
which are using server subprocesses and if any of the ports they are trying to connect on is already taken.
The other changes in this pr make sure we are using only TestServerProcess for port generation and finally
I removed the port argument as a whole from the TestServerProcess constructor.

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

MVrachev · 2020-10-08T11:41:32Z

I was wondering should we remove wait_for_server() function calls and the function itself if we are to merge this pr?
We have a mechanism to check if a server has successfully connected, so what advantages do we get from calling wait_for_server() after those checks?
What do you think @jku @joshuagl?

PS: Can somebody restart Travis CI? It failed with a strange error...

joshuagl

Very nice work here, thanks. I have some comments and suggestions inline.

tests/proxy_server.py

tests/simple_https_server.py

tests/simple_server.py

tests/utils.py

tests/test_updater.py

tests/test_slow_retrieval_attack.py

MVrachev · 2020-10-14T13:12:27Z

Updated the commits with all of your suggestions @joshuagl except on one of your comments.
There I hope Jussi can give his opinion as you have suggested.

jku · 2020-10-15T11:14:50Z

I was wondering should we remove wait_for_server() function calls and the function itself if we are to merge this pr?
We have a mechanism to check if a server has successfully connected, so what advantages do we get from calling wait_for_server() after those checks?

I'm not 100% sure -- maybe successful binding is enough to guarantee that there is a response when the port is tried? It seems reasonable but I would not bet retirement money on that being true on every platform.

Maybe don't remove it now, but add a calendar notification a few weeks after merging. Then we can check CI logs to see whether the combination helped and can remove wait_for_server() as a test to see if your work here is enough alone -- that would be ideal.

tests/simple_server.py

jku

otherwise looks good, left comments about slow retrieval mostly...

tests/utils.py

tests/test_slow_retrieval_attack.py

jku · 2020-10-15T15:05:41Z

Commenting to write down results from chat: I think there is probably a slow_retrieval_server.py refactoring that would mean the class separation and the new function are not required at all in test_slow_retrieval_attack.py -- but I'm definitely not saying it has to be done: spending too much time on these tests is probably not a good investment.

tests/utils.py

joshuagl · 2020-10-15T20:37:40Z

Commenting to write down results from chat: I think there is probably a slow_retrieval_server.py refactoring that would mean the class separation and the new function are not required at all in test_slow_retrieval_attack.py -- but I'm definitely not saying it has to be done: spending too much time on these tests is probably not a good investment.

Agreed. Particularly as slow retrieval is currently not part of the spec and could be removed from the reference implementation (see #1156).

MVrachev · 2020-10-19T15:04:41Z

I addressed all of @jku feedbacks.

I rebased and changed the following thinks:

In tests/utils.py I set up a hardcoded amount of possible port generation attempts.
Changed the place of the If the server process has exited check in has_server_started in tests/utils.py.
Removed the expected to fail test from tests/test_slow_retrieval_attack.py. As I said in my modified commit message:

Slow retrievals have been removed from the specification and soon 
it will be removed from the tuf reference implementation as a whole.
This means that the chances of making this test useful are close to 0 if not none.

Because of this comment Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169 (comment) made by @jku I made a research and found that we creating one additional temporary folder per test class in many of our test files.
Fixed that in the last commit.

joshuagl

Very nice set of changes, thanks Martin!

tests/utils.py

tests/slow_retrieval_server.py

tests/utils.py

jku

I did not look at the new commit yet -- it sounds good but also looks like a lot of unrelated changes that are not the easiest to review... If you want to keep it in this PR we can do that but it could be a separate one as well

tests/utils.py

MVrachev · 2020-10-20T11:44:07Z

I did not look at the new commit yet -- it sounds good but also looks like a lot of unrelated changes that are not the easiest to review... If you want to keep it in this PR we can do that but it could be a separate one as well

Yea, maybe you are right.
This is a big cleanup and it deserves its own pr and it's maybe hard to review.
Additionally, there could be a discussion it.

This mechanism is useful for the tests which are using server subprocesses and if any of the ports they are trying to connect on is already taken. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Remove the test with mode 2 ('mode_2': During the download process, the server blocks the download by sending just several characters every few seconds.) from test_slow_retrieval. This test is marked as "expected failure" with the purpose of rewriting it one day, but slow retrievals have been removed from the specification and soon it will be removed from the tuf reference implementation as a whole. That means that the chances of making this test useful are close to 0 if not none. The other test (with mode 1) in test_slow_retrieval is not removed. For reference: - theupdateframework/specification#111 - theupdateframework#1156 Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Until now, slow_retrieval.py couldn't use the TestServerProcess class port generation and retry mechanism from utils.py because we were using httpd.handle_request() which handles only ONE request. Then, what happened is that when we use wait_for_server() to make a test connection and verify that the server is up, the slow_retrieval server handles that connection (which it accepts as a request) and exits. We avoided that use-case by passing timeout = 0 and avoiding calling wait_for_server() on this special value. Now, when we use httpd.serve_forever() this problem is resolved and no longer we need to make those checks. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

The port argument for __init__ function in TestServerProcess is no longer needed because none of the tests is using it. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

We want to make sure we generate the server ports only in TestServerProcess class from utils.py because it has a retry mechanism on server startup failure which makes it more reliable. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

We added a retry mechanism on server startup failure which generates a new port and tries to start the server again until the server is started or the timeout has expired. We want to make sure that this mechanism is well tested as for usual cases same for corner cases e.g. when the server hangs on forever or just exits unexpectedly. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Now, after we can use wait_for_server and the retry mechanism of TestServerProcess in utils.py we no longer need to use sleep in this test file. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

MVrachev · 2020-10-22T14:03:47Z

I didn't want to rebase this time, but because I had to change the commit messages I had to.

I found a lot of different small or bigger problems while addressing your comments @joshuagl and @jku.
Thanks for the nice reviews!

There are a lot of changes (for which I am sorry that I didn't submit them in the begging) which I will try to summarize:

Removed the last commit which removed the extra temporary folder. Will submit a new pr for it.
Made utils.py to use one variable as a condition for the two loops.
Moved the startup of the server in utils.py in a separate function.
Added tests for utils.py to make sure that we catch the usual use-cases for our abstraction, but also the corner cases (e.g. when the server hangs on forever or exits unexpectedly).
Added a helper function in utils.py called is_process_running which I then used through utils.py and its tests.
Removed the need to use timeout = 0 and make slow_retrieval.py to use the standard workflow in utils.py.
Made sure that we should use the port generation workflow from utils.py when we run the four server files (e.g. simple_server.py, slow_retrieval.py ...)
Added a custom exception when the port is not provided.
Changed the commit messages for Remove test_slow_retrieval expected failure test and Remove port arg from __init__ in TestServerProcess
Removed sleep from test_slow_retrieval_attack.py . We no longer get TimeoutError without it.
Updated some of the comments am introducing them now and not in the begging), but I will summarize them:

jku

Left a few specific comments, but I have some general ones as well:

I'm not 100% sold on the tests: the idea is good (TestServerProcess is complex enough that it should ideally be tested) but testing this properly is not trivial and the implementation should not mean actual code (e.g. simple_server.py) gets more complex because of it
The server startup functionality still smells complex: I think this is the code that will make or break the CI reliability, and it should be both correct and simple enough for everyone to see that it's correct. Not going to details here: let's talk it through later

jku · 2020-10-22T15:54:25Z

tests/utils.py

+class ImproperNumberOfArguments(Exception):
+ """Raised when the number of argumnets is wrong."""
+ pass


Let's not add exceptions if we don't intend to ever handle them (also, server scripts importing the utils.py that starts the server scripts... this does not look right)

jku · 2020-10-22T17:02:35Z

tests/utils.py

- # collecting the logs per test.
- self.__server_process = subprocess.Popen(command,
- stdout=self.__temp_log_file, stderr=subprocess.STDOUT, cwd=popen_cwd)
+ timeout = timeout - (time.time() - start)


this does not do what it should :)

jku · 2020-10-22T17:34:16Z

tests/simple_server.py


+# Used for tests in tests/test_utils.py
+if len(sys.argv) > 2:
+ if sys.argv[2] == "stop":


first, this script already uses argv[2] for something else. Second: I'm not 100% convinced these new features should be here: If you need this kind of "breaking server", add a new script instead

jku · 2020-10-22T18:00:14Z

tests/utils.py

+ def is_process_running(self):
+ """Returns a boolean value if the server process is currently running."""
+
+ return True if self.__server_process.returncode is None else False


after testing and checking docs I found out that returncode does not actually work if poll() or wait() has not been called: so we should just use poll() here instead.

I may have been the one who said "use returncode", sorry about that

MVrachev · 2020-10-28T12:45:21Z

While addressing the comments made by @jku I read somewhere that we can use a simpler way to generate unused ports as showed here: https://docs.python.org/3/library/socketserver.html#asynchronous-mixins

    # Port 0 means to select an arbitrary unused port
    HOST, PORT = "localhost", 0
    server = ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler)

The same method applies for BaseHTTPServer.HTTPServer, TCPServer and ThreadingHTTPServer which we are using in our server scripts used for testing.

This means that instead of us generating a port, verifying that it's not busy, and using a retry mechanism if the port is busy we can simply start the server like that

six.moves.socketserver.TCPServer(('localhost', 0), handler)

and arbitrary unused port will be chosen for us.

I have implemented that new mechnaism in a different branch.

I will close this pr because there is no longer sense for us to generate a new port and validate it as I did and we discussed in this pr and I will create a new pr where I will push this new branch.

joshuagl requested changes Oct 13, 2020

View reviewed changes

MVrachev force-pushed the generate-free-ports branch 2 times, most recently from f6aef32 to 392d4cd Compare October 14, 2020 13:10

jku reviewed Oct 15, 2020

View reviewed changes

tests/simple_server.py Show resolved Hide resolved

jku requested changes Oct 15, 2020

View reviewed changes

jku reviewed Oct 15, 2020

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

MVrachev force-pushed the generate-free-ports branch 2 times, most recently from 4f8a15b to d30bb04 Compare October 19, 2020 15:00

joshuagl approved these changes Oct 20, 2020

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

tests/slow_retrieval_server.py Show resolved Hide resolved

tests/utils.py Outdated Show resolved Hide resolved

jku reviewed Oct 20, 2020

View reviewed changes

MVrachev added 8 commits October 22, 2020 15:22

Add a retry mechanism on server startup failure

cf601cb

This mechanism is useful for the tests which are using server subprocesses and if any of the ports they are trying to connect on is already taken. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Remove port arg from __init__ in TestServerProcess

f61d55c

The port argument for __init__ function in TestServerProcess is no longer needed because none of the tests is using it. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Use only TestServerProcess for port generation

5f9f07a

We want to make sure we generate the server ports only in TestServerProcess class from utils.py because it has a retry mechanism on server startup failure which makes it more reliable. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Remove unused random module imports

6049ef2

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Remove sleep from test_slow_retrieval_attack.py

2647399

Now, after we can use wait_for_server and the retry mechanism of TestServerProcess in utils.py we no longer need to use sleep in this test file. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

MVrachev force-pushed the generate-free-ports branch from d30bb04 to 2647399 Compare October 22, 2020 13:57

jku requested changes Oct 23, 2020

View reviewed changes

MVrachev closed this Oct 28, 2020

This was referenced Oct 28, 2020

Delegate port generation for the tests to the OS #1192

Closed

Tests: Use Queue for process communication which replaces tmp files and use OS for port creation #1198

Merged

MVrachev deleted the generate-free-ports branch November 5, 2020 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169

Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169

MVrachev commented Oct 8, 2020 •

edited

Loading

MVrachev commented Oct 8, 2020 •

edited

Loading

joshuagl left a comment

MVrachev commented Oct 14, 2020

jku commented Oct 15, 2020

jku left a comment

jku commented Oct 15, 2020

joshuagl commented Oct 15, 2020

MVrachev commented Oct 19, 2020 •

edited

Loading

joshuagl left a comment

jku left a comment

MVrachev commented Oct 20, 2020

MVrachev commented Oct 22, 2020 •

edited

Loading

jku left a comment •

edited

Loading

jku Oct 22, 2020

jku Oct 22, 2020

jku Oct 22, 2020

jku Oct 22, 2020

MVrachev commented Oct 28, 2020

Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169

Add a retry mechanism on server startup failure and use TestServerProcess for port generation #1169

Conversation

MVrachev commented Oct 8, 2020 • edited Loading

MVrachev commented Oct 8, 2020 • edited Loading

joshuagl left a comment

Choose a reason for hiding this comment

MVrachev commented Oct 14, 2020

jku commented Oct 15, 2020

jku left a comment

Choose a reason for hiding this comment

jku commented Oct 15, 2020

joshuagl commented Oct 15, 2020

MVrachev commented Oct 19, 2020 • edited Loading

joshuagl left a comment

Choose a reason for hiding this comment

jku left a comment

Choose a reason for hiding this comment

MVrachev commented Oct 20, 2020

MVrachev commented Oct 22, 2020 • edited Loading

jku left a comment • edited Loading

Choose a reason for hiding this comment

jku Oct 22, 2020

Choose a reason for hiding this comment

jku Oct 22, 2020

Choose a reason for hiding this comment

jku Oct 22, 2020

Choose a reason for hiding this comment

jku Oct 22, 2020

Choose a reason for hiding this comment

MVrachev commented Oct 28, 2020

MVrachev commented Oct 8, 2020 •

edited

Loading

MVrachev commented Oct 8, 2020 •

edited

Loading

MVrachev commented Oct 19, 2020 •

edited

Loading

MVrachev commented Oct 22, 2020 •

edited

Loading

jku left a comment •

edited

Loading