Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address flaky test test_model_template_app_rest_metrics_endpoint #2293

Closed
axsaucedo opened this issue Aug 18, 2020 · 2 comments · Fixed by #2336
Closed

Address flaky test test_model_template_app_rest_metrics_endpoint #2293

axsaucedo opened this issue Aug 18, 2020 · 2 comments · Fixed by #2336
Assignees
Milestone

Comments

@axsaucedo
Copy link
Contributor

It seems that for a few PRs now there is flakiness on the test_model_template_app_rest_metrics_endpoint test. The error that appears is outlined below, and seems to be an issue similar to #1745 whereby the reported issue is due to the server not being able to bind to the port.

=================================== FAILURES ===================================
________________ test_model_template_app_rest_metrics_endpoint _________________
Traceback (most recent call last):
  File "/workspace/source/python/tests/test_microservice.py", line 82, in test_model_template_app_rest_metrics_endpoint
    assert response.status_code == 200
AssertionError: assert 500 == 200
 +  where 500 = <Response [500]>.status_code
---------------------------- Captured stderr setup -----------------------------
INFO:root:starting: seldon-core-microservice MyModel REST --service-type MODEL --persistence 0
[2020-08-18 05:45:08 +0000] [26661] [INFO] Worker exiting (pid: 26661)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
    p.join()
  File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
[2020-08-18 05:45:08 +0000] [26627] [INFO] Shutting down: Master
[2020-08-18 05:45:09 +0000] [26660] [ERROR] Connection in use: ('0.0.0.0', 6005)
[2020-08-18 05:45:09 +0000] [26660] [ERROR] Retrying in 1 second.
2020-08-18 05:45:09,667 - seldon_core.microservice:main:207 - INFO:  Starting microservice.py:main
2020-08-18 05:45:09,667 - seldon_core.microservice:main:208 - INFO:  Seldon Core version: 1.2.3-dev
2020-08-18 05:45:09,670 - seldon_core.microservice:main:310 - INFO:  Parse JAEGER_EXTRA_TAGS []
2020-08-18 05:45:09,670 - seldon_core.microservice:main:313 - INFO:  Annotations: {}
2020-08-18 05:45:09,670 - seldon_core.microservice:main:317 - INFO:  Importing MyModel
2020-08-18 05:45:09,671 - MyModel:__init__:15 - INFO:  Initializing model
2020-08-18 05:45:09,778 - seldon_core.microservice:main:388 - INFO:  REST gunicorn microservice running on port 5000
2020-08-18 05:45:09,780 - seldon_core.microservice:main:446 - INFO:  REST metrics microservice running on port 6005
2020-08-18 05:45:09,780 - seldon_core.microservice:main:456 - INFO:  Starting servers
DEBUG:root:=== trying again
[2020-08-18 05:45:09 +0000] [26671] [INFO] Starting gunicorn 20.0.4
[2020-08-18 05:45:09 +0000] [26671] [INFO] Listening at: http://0.0.0.0:5000 (26671)
[2020-08-18 05:45:09 +0000] [26671] [INFO] Using worker: threads
[2020-08-18 05:45:09 +0000] [26705] [INFO] Booting worker with pid: 26705
[2020-08-18 05:45:09 +0000] [26704] [INFO] Starting gunicorn 20.0.4
[2020-08-18 05:45:09 +0000] [26704] [ERROR] Connection in use: ('0.0.0.0', 6005)
[2020-08-18 05:45:09 +0000] [26704] [ERROR] Retrying in 1 second.
[2020-08-18 05:45:10 +0000] [26660] [ERROR] Can't connect to ('0.0.0.0', 6005)
[2020-08-18 05:45:10 +0000] [26704] [ERROR] Connection in use: ('0.0.0.0', 6005)
[2020-08-18 05:45:10 +0000] [26704] [ERROR] Retrying in 1 second.
[2020-08-18 05:45:11 +0000] [26704] [ERROR] Connection in use: ('0.0.0.0', 6005)
[2020-08-18 05:45:11 +0000] [26704] [ERROR] Retrying in 1 second.
[2020-08-18 05:45:12 +0000] [26704] [ERROR] Connection in use: ('0.0.0.0', 6005)
[2020-08-18 05:45:12 +0000] [26704] [ERROR] Retrying in 1 second.
DEBUG:root:=== trying again
INFO:root:microservice ready
------------------------------ Captured log setup ------------------------------
INFO     root:helpers.py:65 starting: seldon-core-microservice MyModel REST --service-type MODEL --persistence 0
DEBUG    root:helpers.py:80 === trying again
DEBUG    root:helpers.py:80 === trying again
INFO     root:helpers.py:88 microservice ready
----------------------------- Captured stdout call -----------------------------
127.0.0.1 - - [18/Aug/2020:05:45:13 +0000] "GET /metrics-endpoint HTTP/1.1" 500 290 "-" "python-requests/2.24.0"
----------------------------- Captured stderr call -----------------------------
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1:6005
2020-08-18 05:45:13,805 - seldon_core.wrapper:log_exception:1892 - ERROR:  Exception on /metrics-endpoint [GET]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/managers.py", line 788, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/seldon_core/wrapper.py", line 163, in Metrics
    metrics, mimetype = seldon_metrics.generate_metrics()
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/seldon_core/metrics.py", line 185, in generate_metrics
    exposition.generate_latest(myregistry).decode("utf-8"),
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/prometheus_client/exposition.py", line 106, in generate_latest
    for metric in registry.collect():
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/prometheus_client/registry.py", line 82, in collect
    for metric in collector.collect():
  File "/workspace/source/python/.tox/azure/lib/python3.7/site-packages/seldon_core/metrics.py", line 159, in collect
    with self._lock:
  File "/opt/conda/lib/python3.7/multiprocessing/managers.py", line 1030, in __enter__
    return self._callmethod('acquire')
  File "/opt/conda/lib/python3.7/multiprocessing/managers.py", line 792, in _callmethod
    self._connect()
  File "/opt/conda/lib/python3.7/multiprocessing/managers.py", line 779, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
DEBUG:urllib3.connectionpool:http://127.0.0.1:6005 "GET /metrics-endpoint HTTP/1.1" 500 290
------------------------------ Captured log call -------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:230 Starting new HTTP connection (1): 127.0.0.1:6005
DEBUG    urllib3.connectionpool:connectionpool.py:442 http://127.0.0.1:6005 "GET /metrics-endpoint HTTP/1.1" 500 290

@axsaucedo axsaucedo added bug triage Needs to be triaged and prioritised accordingly labels Aug 18, 2020
@axsaucedo axsaucedo added this to the 1.3 milestone Aug 18, 2020
@RafalSkolasinski
Copy link
Contributor

Yes, this seems to be exactly failing on the same fixture as #1745

@ukclivecox ukclivecox added test-infra and removed triage Needs to be triaged and prioritised accordingly labels Aug 20, 2020
@axsaucedo
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants