Delegate port generation for the tests to the OS #1192

MVrachev · 2020-10-28T13:04:23Z

Description of the changes being introduced by the pull request:

This pr aims to make the creation of new server subprocesses more stable.
It fixes issues related to the usage of ports already in use like
[Errno 98] Address already in use or ConnectionRefusedError: [Errno 111]

By using 0 as a port argument when we start the servers we ask the OS to give us
an arbitrary unused port.
This method is much better than if we generate a port, because
instead of us generating a port, verifying that it's not busy, and use a retry mechanism (if the port is busy)
we can simply rely on the OS to do that job for us.

This means that the server scripts receive their ports when they are created and then send it back to the
father process which had created them.

Other changes/cleanups in this pr include

remove test_slow_retrieval expected failure test
make sure there is no place where we are generating server ports outside the server scripts.
remove port argument from TestServerProcess in tests/utils.py
remove unused imports
remove try/except at server test files

This pr is related to the closed #1169 pr but because
I decided to use a whole different approach I thought it made more sense to close that one and create this one.

PS: Great thanks to @jku with whom we had discussions about this pr on regular occasions!

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

Remove the test with mode 2 ('mode_2': During the download process, the server blocks the download by sending just several characters every few seconds.) from test_slow_retrieval. This test is marked as "expected failure" with the purpose of rewriting it one day, but slow retrievals have been removed from the specification and soon it will be removed from the tuf reference implementation as a whole. That means that the chances of making this test useful are close to 0 if not none. The other test (with mode 1) in test_slow_retrieval is not removed. For reference: - theupdateframework/specification#111 - theupdateframework#1156 Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

By giving 0 to the port argument we ask the OS to give us an arbitrary unused port. This method is much better than if we generate a port, because even though we are generating a random port there is always the chance that the port could be already in use. In this commit, I also make sure that there is no place in the tests where we are manually generating and passing ports. Also, because none of the tests pass a port to the TestServerProcess class and its __init__ function I have removed the "port" argument. Finally, until now slow_retrieval.py couldn't use the TestServerProcess class from utils.py port generation because we were using httpd.handle_request() which handles only ONE request. Then, what happened was that when we use wait_for_server() to make a test connection and verify that the server is up, the slow_retrieval server handles that connection (which it accepts as a request) and exits. We avoided that use-case by passing timeout = 0 and avoiding calling wait_for_server() on this special value. Now, when we use httpd.serve_forever() this problem is resolved and no longer we need to make those checks. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Now, after we can use wait_for_server and the retry mechanism of TestServerProcess in utils.py we no longer need to use sleep in this test file. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

We want to make sure that server are successfully started in the common use cases and that the new port generation works. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

MVrachev · 2020-10-28T14:27:06Z

I had some discussions with @jku should we add tests for the broken cases when we create a server process.
For example, when we don't receive the expected message in the temp log file from the server or the server process has just died, etc.
The problem is, that this would mean to wait for the whole timeout to finish and wait for Timeout exception to be raised.
We won't want to use the default timeout of 10 seconds for each of those tests given that the server process is started for like 0.10 ~ 0.20 seconds on a successful run on my VM.
On another hand, we don't want to make the timeout too low, because Travis CI or AppVeyor could be too slow to start the process and not start it at all in which case when we receive Timeout exception we haven't tested anything...

One solution I can think of is using the default timeout and moving these tests to a different dir which will be executed only by
Travis CI and AppVeyor. Those tests are not exactly unit tests but more of integration tests, so for me, it sounds logical.

jku

I like this! The only hack here is parsing server logs to get the port but I believe that's a decent price to pay (and arguably less of a hack than retrying on bind() failure). Tests should be more reliable and faster because of this -- and not just because one very slow test was removed.

Only real question is why _start_server() still has two loops: I thought the point was that the outer loop was replaced by letting bind() pick a port...

tests/test_utils.py

tests/utils.py

MVrachev · 2020-10-28T18:49:51Z

I like this! The only hack here is parsing server logs to get the port but I believe that's a decent price to pay (and arguably less of a hack than retrying on bind() failure). Tests should be more reliable and faster because of this -- and not just because one very slow test was removed.

Only real question is why _start_server() still has two loops: I thought the point was that the outer loop was replaced by letting bind() pick a port...

Yes, I was thinking that if we have a problem starting the server we can try again if we haven't finished the timeout?
Or probably it's best just to fail because it won't be a problem related to already taken port...

jku · 2020-10-28T19:14:24Z

Or probably it's best just to fail because it won't be a problem related to already taken port...

Yes: we have no idea why it might have failed so no reason to believe it would be fixed by retrying.

jku · 2020-10-28T19:39:52Z

Oh one more thing about the new tests: Apart from the clean-test I think everything else is already tested by the current tests. The new tests are obviously more "unit-testy" since they only test a single thing ... but still: is there value in adding tests that are basically subsets of existing tests?

MVrachev · 2020-10-29T10:19:25Z

Oh one more thing about the new tests: Apart from the clean-test I think everything else is already tested by the current tests. The new tests are obviously more "unit-testy" since they only test a single thing ... but still: is there value in adding tests that are basically subsets of existing tests?

They provide us a way to clearly see when we have a bug while starting any of the subprocesses running a server script, which is not always easy to be seen given that most of those server subprocesses are initialized in setUpClass where we can't receive stdout or stderr.

The tests on my virtual machine are run for around 0.5 - 1.0 seconds and with python2 (or none of the tests is skipped) is the same.

trishankatdatadog · 2020-10-29T11:35:28Z

Just wanted to say: keep up the great work, y'all! cc @sechkova @joshuagl

jku

Getting close... LGTM except for the loop removal that seems incorrect (indent change needed for the elapsed time calculation)

tests/utils.py

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

MVrachev · 2020-10-29T14:43:43Z

I realized that now when we had removed the outer loop I can add tests that check the case when a server has exited before the timeout has expired.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

tests/simple_server.py

As discussed with Jussi, using try and except blocks when instantiating the servers don't bring a lot of value (the "bind failed") message could be useful, but there are other messages explaining almost the same thing which will be logged from the father process as well. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

jku

Man from Del Monte says yes 👍

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

jku · 2020-10-30T12:20:02Z

tests/utils.py

@@ -37,6 +37,7 @@
 try:
 # is defined in Python 3
 TimeoutError
+ ChildProcessError


This will not work:

if TimeoutError fails, you define class ChildProcessError?

the code that uses this is in another file and uses utils.ChildProcessError -- that won't be defined in python 3

I think it makes sense to just always define an error of your own since ChildProcessError is not in python2 (it's not 100% correct even in python3): TestServerError or something.

Both TimeoutError and ChildProcessError are available on python 3.X and not available on python 2.X, so if TimeoutError is not found then it makes sense to define both of them.

the code that uses this is in another file and uses utils.ChildProcessError -- that won't be defined in python 3

for that you are right. I was mistaken in my assumption.

I think that ChildProcessError is a suitable name for that error.
I just think that in test_utils.py I will fix that error with:

child_process_error = None if sys.version_info.major == 2: child_process_error = utils.ChildProcessError else: child_process_error = ChildProcessError

Both TimeoutError and ChildProcessError are available on python 3.X and not available on python 2.X, so if TimeoutError is not found then it makes sense to define both of them.

no. You can either decide things based on python version or you can decide things based on feature testing. Using feature testing as if it is python version testing is wrong even if it happens to work right now.

jku · 2020-10-30T12:30:36Z

Since the failures are now hidden in the UI: Earlier test runs (e.g. https://ci.appveyor.com/project/theupdateframework/tuf/builds/36033643/job/0pm05aeiihcn9nad for commit 9e53c4a) had error like this:

======================================================================
ERROR: test_get_valid_targetinfo (test_updater.TestMultiRepoUpdater)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\projects\tuf\tests\test_updater.py", line 1875, in setUp
    server=self.SIMPLE_SERVER_PATH, popen_cwd=self.repository_directory2)
  File "C:\projects\tuf\tests\utils.py", line 165, in __init__
    raise e
TimeoutError: u'Failure during C:\\projects\\tuf\\tests\\simple_server.py startup!'

That's worrying and needs to be investigated.

MVrachev · 2020-11-04T13:11:34Z

After the findings made by @jku and described here #1196
I made changes to the Delegate port generation for the tests to the OS commit following his idea about using a thread-safe Queue for process communication and also had to squash a few other commits.

That's why it makes sense to close this pr and open another pr which uses this new method.
Of course, I could reuse this pr, but given that I have rebased all of the commits and I will have to change the pr message together with some of the commit messages I think this would be more confusing.

MVrachev added 5 commits October 28, 2020 14:48

Remove unused random module imports

2c5617b

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Remove sleep from test_slow_retrieval_attack.py

1e6970d

Now, after we can use wait_for_server and the retry mechanism of TestServerProcess in utils.py we no longer need to use sleep in this test file. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Add tests for tests/utils.py

5b66cb4

We want to make sure that server are successfully started in the common use cases and that the new port generation works. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

jku requested changes Oct 28, 2020

View reviewed changes

tests/test_utils.py Outdated Show resolved Hide resolved

tests/utils.py Outdated Show resolved Hide resolved

tests/utils.py Outdated Show resolved Hide resolved

tests/utils.py Outdated Show resolved Hide resolved

jku reviewed Oct 29, 2020

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

jku reviewed Oct 29, 2020

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

MVrachev force-pushed the change-port-generation branch 2 times, most recently from d260a70 to de8b896 Compare October 29, 2020 14:14

MVrachev added 4 commits October 29, 2020 16:40

Update on "Delegate port generation ..." commit

8d9eef0

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Update "Remove slow_retrieval failure test" commit

eab043e

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Update "Add tests for tests/utils.py" commit

b722eac

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

Add tests for server exit before timeout expires

097c78b

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

MVrachev force-pushed the change-port-generation branch from de8b896 to 097c78b Compare October 29, 2020 14:42

Shorten CildProcessError message

e6e70a7

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

jku reviewed Oct 29, 2020

View reviewed changes

tests/simple_server.py Outdated Show resolved Hide resolved

jku approved these changes Oct 29, 2020

View reviewed changes

Define ChildProcessError for Python2

3444933

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>

jku reviewed Oct 30, 2020

View reviewed changes

jku mentioned this pull request Nov 2, 2020

tests: server logging via tempfile may have threading issues #1196

Closed

MVrachev closed this Nov 4, 2020

MVrachev mentioned this pull request Nov 4, 2020

Tests: Use Queue for process communication which replaces tmp files and use OS for port creation #1198

Merged

3 tasks

MVrachev deleted the change-port-generation branch November 5, 2020 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delegate port generation for the tests to the OS #1192

Delegate port generation for the tests to the OS #1192

MVrachev commented Oct 28, 2020 •

edited

Loading

MVrachev commented Oct 28, 2020 •

edited

Loading

jku left a comment

MVrachev commented Oct 28, 2020

jku commented Oct 28, 2020

jku commented Oct 28, 2020

MVrachev commented Oct 29, 2020 •

edited

Loading

trishankatdatadog commented Oct 29, 2020

jku left a comment •

edited

Loading

MVrachev commented Oct 29, 2020

jku left a comment

jku Oct 30, 2020 •

edited

Loading

jku Oct 30, 2020 •

edited

Loading

MVrachev Oct 30, 2020

jku Oct 30, 2020

jku commented Oct 30, 2020

MVrachev commented Nov 4, 2020 •

edited

Loading

Delegate port generation for the tests to the OS #1192

Delegate port generation for the tests to the OS #1192

Conversation

MVrachev commented Oct 28, 2020 • edited Loading

MVrachev commented Oct 28, 2020 • edited Loading

jku left a comment

Choose a reason for hiding this comment

MVrachev commented Oct 28, 2020

jku commented Oct 28, 2020

jku commented Oct 28, 2020

MVrachev commented Oct 29, 2020 • edited Loading

trishankatdatadog commented Oct 29, 2020

jku left a comment • edited Loading

Choose a reason for hiding this comment

MVrachev commented Oct 29, 2020

jku left a comment

Choose a reason for hiding this comment

jku Oct 30, 2020 • edited Loading

Choose a reason for hiding this comment

jku Oct 30, 2020 • edited Loading

Choose a reason for hiding this comment

MVrachev Oct 30, 2020

Choose a reason for hiding this comment

jku Oct 30, 2020

Choose a reason for hiding this comment

jku commented Oct 30, 2020

MVrachev commented Nov 4, 2020 • edited Loading

MVrachev commented Oct 28, 2020 •

edited

Loading

MVrachev commented Oct 28, 2020 •

edited

Loading

MVrachev commented Oct 29, 2020 •

edited

Loading

jku left a comment •

edited

Loading

jku Oct 30, 2020 •

edited

Loading

jku Oct 30, 2020 •

edited

Loading

MVrachev commented Nov 4, 2020 •

edited

Loading