Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

No experiment log showed in integration test when nnictl create failed #2290

Closed
chicm-ms opened this issue Apr 9, 2020 · 0 comments
Closed
Assignees
Labels
bug Something isn't working nnidev

Comments

@chicm-ms
Copy link
Contributor

chicm-ms commented Apr 9, 2020

When nnictl create or rest api query failed, experiment log is not displayed, this is not friendly for trouble shooting.

2020-04-09T04:08:53.0470472Z ##[section]Starting: Integration test
2020-04-09T04:08:53.0478551Z ==============================================================================
2020-04-09T04:08:53.0479151Z Task         : Command line
2020-04-09T04:08:53.0480387Z Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2020-04-09T04:08:53.0480928Z Version      : 2.164.0
2020-04-09T04:08:53.0481282Z Author       : Microsoft Corporation
2020-04-09T04:08:53.0481755Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2020-04-09T04:08:53.0482270Z ==============================================================================
2020-04-09T04:08:53.1775643Z Generating script.
2020-04-09T04:08:53.1792419Z ========================== Starting Command Output ===========================
2020-04-09T04:08:53.1818577Z [command]/bin/bash --noprofile --norc /mnt/home/nni/myagent/_work/_temp/780fb57c-de78-4774-8382-5d39489cc41d.sh
2020-04-09T04:08:53.4061942Z {
2020-04-09T04:08:53.4063064Z     "stopCommand": "nnictl stop",
2020-04-09T04:08:53.4063591Z     "platform": "linux darwin win32",
2020-04-09T04:08:53.4063980Z     "config": null,
2020-04-09T04:08:53.4065079Z     "configFile": "test/config/examples/sklearn-classification.yml",
2020-04-09T04:08:53.4065621Z     "validator": null,
2020-04-09T04:08:53.4066231Z     "name": "sklearn-classification",
2020-04-09T04:08:53.4066708Z     "setExperimentIdtoVar": "$expId",
2020-04-09T04:08:53.4067370Z     "launchCommand": "nnictl create --config $configFile",
2020-04-09T04:08:53.4067877Z     "experimentStatusCheck": true
2020-04-09T04:08:53.4068171Z }
2020-04-09T04:08:53.4068692Z �[32mTesting: sklearn-classification�[0m
2020-04-09T04:08:53.4069035Z assessor:
2020-04-09T04:08:53.4069318Z   builtinAssessorName: Medianstop
2020-04-09T04:08:53.4069552Z   classArgs:
2020-04-09T04:08:53.4069802Z     optimize_mode: maximize
2020-04-09T04:08:53.4070094Z authorName: nni
2020-04-09T04:08:53.4070377Z experimentName: default_test
2020-04-09T04:08:53.4070682Z logCollection: http
2020-04-09T04:08:53.4070982Z maxExecDuration: 5m
2020-04-09T04:08:53.4071254Z maxTrialNum: 4
2020-04-09T04:08:53.4071536Z multiPhase: false
2020-04-09T04:08:53.4071779Z multiThread: false
2020-04-09T04:08:53.4072160Z searchSpacePath: ../../../examples/trials/sklearn/classification/search_space.json
2020-04-09T04:08:53.4072504Z trainingServicePlatform: local
2020-04-09T04:08:53.4072733Z trial:
2020-04-09T04:08:53.4072983Z   codeDir: ../../../examples/trials/sklearn/classification
2020-04-09T04:08:53.4073283Z   command: python3 main.py
2020-04-09T04:08:53.4073499Z   gpuNum: 0
2020-04-09T04:08:53.4073978Z trialConcurrency: 2
2020-04-09T04:08:53.4074307Z tuner:
2020-04-09T04:08:53.4074613Z   builtinTunerName: Random
2020-04-09T04:08:53.4074969Z useAnnotation: false
2020-04-09T04:08:53.4075238Z 
2020-04-09T04:08:53.4075936Z launch command:  nnictl create --config ../test/config/examples/sklearn-classification.yml.tmp
2020-04-09T04:08:54.0667548Z INFO: expand searchSpacePath: ../../../examples/trials/sklearn/classification/search_space.json to /mnt/home/nni/myagent/_work/1/s/test/config/examples/../../../examples/trials/sklearn/classification/search_space.json 
2020-04-09T04:08:54.0669996Z INFO: expand codeDir: ../../../examples/trials/sklearn/classification to /mnt/home/nni/myagent/_work/1/s/test/config/examples/../../../examples/trials/sklearn/classification 
2020-04-09T04:08:54.8552753Z INFO: Starting restful server...
2020-04-09T04:08:57.8839873Z INFO: Successfully started Restful server!
2020-04-09T04:08:57.8840658Z INFO: Setting local config...
2020-04-09T04:08:57.9003903Z INFO: Successfully set local config!
2020-04-09T04:08:57.9004481Z INFO: Starting experiment...
2020-04-09T04:08:57.9471790Z INFO: Successfully started experiment!
2020-04-09T04:08:57.9473357Z ------------------------------------------------------------------------------------
2020-04-09T04:08:57.9473887Z The experiment id is lwl3dQrb
2020-04-09T04:08:57.9474408Z The Web UI urls are: http://127.0.0.1:8080   http://192.168.10.8:8080   http://172.17.0.1:8080
2020-04-09T04:08:57.9476086Z ------------------------------------------------------------------------------------
2020-04-09T04:08:57.9476507Z 
2020-04-09T04:08:57.9476809Z You can use these commands to get more information about the experiment
2020-04-09T04:08:57.9477441Z ------------------------------------------------------------------------------------
2020-04-09T04:08:57.9477927Z          commands                       description
2020-04-09T04:08:57.9478340Z 1. nnictl experiment show        show the information of experiments
2020-04-09T04:08:57.9478765Z 2. nnictl trial ls               list all of trial jobs
2020-04-09T04:08:57.9479180Z 3. nnictl top                    monitor the status of running experiments
2020-04-09T04:08:57.9479884Z 4. nnictl log stderr             show stderr log content
2020-04-09T04:08:57.9480323Z 5. nnictl log stdout             show stdout log content
2020-04-09T04:08:57.9480728Z 6. nnictl stop                   stop an experiment
2020-04-09T04:08:57.9481335Z 7. nnictl trial kill             kill a trial job by id
2020-04-09T04:08:57.9482410Z 8. nnictl --help                 get help information about nnictl
2020-04-09T04:08:57.9483986Z ------------------------------------------------------------------------------------
2020-04-09T04:08:57.9484562Z Command reference document https://nni.readthedocs.io/en/latest/Tutorial/Nnictl.html
2020-04-09T04:08:57.9485288Z ------------------------------------------------------------------------------------
2020-04-09T04:08:57.9485686Z 
2020-04-09T04:08:58.0746369Z variables: {'$expId': 'lwl3dQrb', '$configFile': '../test/config/examples/sklearn-classification.yml.tmp'}
2020-04-09T04:08:58.0748402Z max_duration: 300  max_trial_num: 4
2020-04-09T04:08:58.0749506Z 2020-04-09 04:08:58.074120  waiting ...
2020-04-09T04:09:10.1223636Z Stop command: nnictl stop
2020-04-09T04:09:10.7905747Z ERROR: There is no experiment running...
2020-04-09T04:09:10.9207344Z Traceback (most recent call last):
2020-04-09T04:09:10.9209189Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connection.py", line 157, in _new_conn
2020-04-09T04:09:10.9209731Z     (self._dns_host, self.port), self.timeout, **extra_kw
2020-04-09T04:09:10.9210433Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/util/connection.py", line 84, in create_connection
2020-04-09T04:09:10.9210826Z     raise err
2020-04-09T04:09:10.9211419Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/util/connection.py", line 74, in create_connection
2020-04-09T04:09:10.9211824Z     sock.connect(sa)
2020-04-09T04:09:10.9212125Z ConnectionRefusedError: [Errno 111] Connection refused
2020-04-09T04:09:10.9212355Z 
2020-04-09T04:09:10.9212624Z During handling of the above exception, another exception occurred:
2020-04-09T04:09:10.9212878Z 
2020-04-09T04:09:10.9213091Z Traceback (most recent call last):
2020-04-09T04:09:10.9213676Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 672, in urlopen
2020-04-09T04:09:10.9214084Z     chunked=chunked,
2020-04-09T04:09:10.9214654Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 387, in _make_request
2020-04-09T04:09:10.9215099Z     conn.request(method, url, **httplib_request_kw)
2020-04-09T04:09:10.9215437Z   File "/usr/lib/python3.5/http/client.py", line 1122, in request
2020-04-09T04:09:10.9215789Z     self._send_request(method, url, body, headers)
2020-04-09T04:09:10.9216132Z   File "/usr/lib/python3.5/http/client.py", line 1167, in _send_request
2020-04-09T04:09:10.9216449Z     self.endheaders(body)
2020-04-09T04:09:10.9216746Z   File "/usr/lib/python3.5/http/client.py", line 1118, in endheaders
2020-04-09T04:09:10.9220101Z     self._send_output(message_body)
2020-04-09T04:09:10.9221065Z   File "/usr/lib/python3.5/http/client.py", line 944, in _send_output
2020-04-09T04:09:10.9221541Z     self.send(msg)
2020-04-09T04:09:10.9221911Z   File "/usr/lib/python3.5/http/client.py", line 887, in send
2020-04-09T04:09:10.9222531Z     self.connect()
2020-04-09T04:09:10.9223789Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connection.py", line 184, in connect
2020-04-09T04:09:10.9224293Z     conn = self._new_conn()
2020-04-09T04:09:10.9225160Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connection.py", line 169, in _new_conn
2020-04-09T04:09:10.9225715Z     self, "Failed to establish a new connection: %s" % e
2020-04-09T04:09:10.9226385Z urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb8dcd230f0>: Failed to establish a new connection: [Errno 111] Connection refused
2020-04-09T04:09:10.9226965Z 
2020-04-09T04:09:10.9227292Z During handling of the above exception, another exception occurred:
2020-04-09T04:09:10.9227593Z 
2020-04-09T04:09:10.9227867Z Traceback (most recent call last):
2020-04-09T04:09:10.9228525Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/adapters.py", line 449, in send
2020-04-09T04:09:10.9228983Z     timeout=timeout
2020-04-09T04:09:10.9229620Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 720, in urlopen
2020-04-09T04:09:10.9230158Z     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
2020-04-09T04:09:10.9230859Z   File "/home/nni/.local/lib/python3.5/site-packages/urllib3/util/retry.py", line 436, in increment
2020-04-09T04:09:10.9231386Z     raise MaxRetryError(_pool, url, error or ResponseError(cause))
2020-04-09T04:09:10.9232692Z urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb8dcd230f0>: Failed to establish a new connection: [Errno 111] Connection refused',))
2020-04-09T04:09:10.9233526Z 
2020-04-09T04:09:10.9233867Z During handling of the above exception, another exception occurred:
2020-04-09T04:09:10.9234162Z 
2020-04-09T04:09:10.9234433Z Traceback (most recent call last):
2020-04-09T04:09:10.9234798Z   File "nni_test/nnitest/run_tests.py", line 230, in <module>
2020-04-09T04:09:10.9235149Z     run(args)
2020-04-09T04:09:10.9235469Z   File "nni_test/nnitest/run_tests.py", line 217, in run
2020-04-09T04:09:10.9235864Z     run_test_case(test_case_config, it_config, args)
2020-04-09T04:09:10.9236255Z   File "nni_test/nnitest/run_tests.py", line 72, in run_test_case
2020-04-09T04:09:10.9236671Z     launch_test(new_config_file, args.ts, test_case_config)
2020-04-09T04:09:10.9237070Z   File "nni_test/nnitest/run_tests.py", line 143, in launch_test
2020-04-09T04:09:10.9237469Z     status = get_experiment_status(STATUS_URL)
2020-04-09T04:09:10.9237914Z   File "/mnt/home/nni/myagent/_work/1/s/test/nni_test/nnitest/utils.py", line 91, in get_experiment_status
2020-04-09T04:09:10.9238384Z     nni_status = requests.get(status_url).json()
2020-04-09T04:09:10.9239055Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/api.py", line 76, in get
2020-04-09T04:09:10.9239859Z     return request('get', url, params=params, **kwargs)
2020-04-09T04:09:10.9240602Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/api.py", line 61, in request
2020-04-09T04:09:10.9241088Z     return session.request(method=method, url=url, **kwargs)
2020-04-09T04:09:10.9241785Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/sessions.py", line 530, in request
2020-04-09T04:09:10.9242245Z     resp = self.send(prep, **send_kwargs)
2020-04-09T04:09:10.9242906Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/sessions.py", line 643, in send
2020-04-09T04:09:10.9243368Z     r = adapter.send(request, **kwargs)
2020-04-09T04:09:10.9244029Z   File "/home/nni/.local/lib/python3.5/site-packages/requests/adapters.py", line 516, in send
2020-04-09T04:09:10.9244486Z     raise ConnectionError(e, request=request)
2020-04-09T04:09:10.9245770Z requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb8dcd230f0>: Failed to establish a new connection: [Errno 111] Connection refused',))
2020-04-09T04:09:10.9515480Z 
2020-04-09T04:09:10.9577378Z ##[error]Bash exited with code '1'.
2020-04-09T04:09:10.9593225Z ##[section]Finishing: Integration test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working nnidev
Projects
None yet
Development

No branches or pull requests

3 participants