Dev exp stop more (#221)

* Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change
microsoft · Oct 18, 2018 · ee6b149 · ee6b149
1 parent b183c3d
commit ee6b149
Show file tree

Hide file tree

Showing 17 changed files with 302 additions and 315 deletions.
diff --git a/Makefile b/Makefile
@@ -186,6 +186,7 @@ install-python-modules:
 install-node-modules:
 	mkdir -p $(INSTALL_PREFIX)/nni
 	rm -rf src/nni_manager/dist/node_modules
+	rm -rf $(INSTALL_PREFIX)/nni/nni_manager
 
 	#$(_INFO) Installing NNI Manager $(_END)
 	cp -rT src/nni_manager/dist $(INSTALL_PREFIX)/nni/nni_manager

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -9,10 +9,10 @@ steps:
   - script: python3 -m pip install --upgrade pip setuptools
     displayName: 'Install python tools'
   - script: |
-      make easy-install
-      export PATH=$HOME/.nni/bin:$PATH
+      source install.sh
     displayName: 'Install dependencies'
   - script: |
       cd test/naive
-      PATH=$HOME/.local/nni/node/bin:$PATH python3 run.py
+      export PATH=$HOME/.local/bin:$PATH
+      python3 run.py
     displayName: 'Run tests'
diff --git a/docs/HowToContribute.md b/docs/HowToContribute.md
@@ -51,4 +51,4 @@ After you change some code, just use **step 4** to rebuild your code, then the c
 
 ---
 At last, wish you have a wonderful day.
-For more contribution guidelines on making PR's or issues to NNI source code, you can refer to our [CONTRIBUTING](./docs/CONTRIBUTING.md) document. 
+For more contribution guidelines on making PR's or issues to NNI source code, you can refer to our [CONTRIBUTING](./docs/CONTRIBUTING.md) document. 
diff --git a/docs/WriteYourTrial.md b/docs/WriteYourTrial.md
@@ -123,4 +123,4 @@ useAnnotation: true
 ```
 
 ## More Trial Example
-* [Automatic Model Architecture Search for Reading Comprehension.](../examples/trials/ga_squad/README.md)
+* [Automatic Model Architecture Search for Reading Comprehension.](../examples/trials/ga_squad/README.md)
diff --git a/src/nni_manager/common/manager.ts b/src/nni_manager/common/manager.ts
@@ -22,7 +22,7 @@
 import { MetricDataRecord, MetricType, TrialJobInfo } from './datastore';
 import { TrialJobStatus } from './trainingService';
 
-type ProfileUpdateType = 'TRIAL_CONCURRENCY' | 'MAX_EXEC_DURATION' | 'SEARCH_SPACE';
+type ProfileUpdateType = 'TRIAL_CONCURRENCY' | 'MAX_EXEC_DURATION' | 'SEARCH_SPACE' | 'MAX_TRIAL_NUM';
 
 interface ExperimentParams {
     authorName: string;
@@ -73,7 +73,7 @@ interface TrialJobStatistics {
 }
 
 interface NNIManagerStatus {
-    status: 'INITIALIZED' | 'EXPERIMENT_RUNNING' | 'ERROR' | 'STOPPING' | 'STOPPED';
+    status: 'INITIALIZED' | 'EXPERIMENT_RUNNING' | 'ERROR' | 'STOPPING' | 'STOPPED' | 'DONE';
     errors: string[];
 }
 

diff --git a/src/nni_manager/core/nnimanager.ts b/src/nni_manager/core/nnimanager.ts
diff --git a/src/nni_manager/core/trialJobs.ts b/src/nni_manager/core/trialJobs.ts
diff --git a/src/nni_manager/rest_server/restValidationSchemas.ts b/src/nni_manager/rest_server/restValidationSchemas.ts
@@ -86,7 +86,7 @@ export namespace ValidationSchemas {
     };
     export const UPDATEEXPERIMENT = {
         query: {
-            update_type: joi.string().required().valid('TRIAL_CONCURRENCY', 'MAX_EXEC_DURATION', 'SEARCH_SPACE')
+            update_type: joi.string().required().valid('TRIAL_CONCURRENCY', 'MAX_EXEC_DURATION', 'SEARCH_SPACE', 'MAX_TRIAL_NUM')
         },
         body: {
             id: joi.string().required(),

diff --git a/test/naive/.gitignore b/test/naive/.gitignore
@@ -0,0 +1,5 @@
+__pycache__
+
+tuner_search_space.json
+tuner_result.txt
+assessor_result.txt
diff --git a/test/naive/README.md b/test/naive/README.md
@@ -0,0 +1,20 @@
+## Usage
+
+* To test before installing:
+`./run.py --preinstall`
+* To test the integrity of installation:
+`./run.py`
+* It will print `PASS` in green eventually if everything works well.
+
+## Details
+* This test case tests the communication between trials and tuner/assessor.
+* The naive trials receive an integer `x` as parameter, and reports `x`, `x²`, `x³`, ... , `x¹⁰` as metrics.
+* The naive tuner simply generates the sequence of natural numbers, and print received metrics to `tuner_result.txt`.
+* The naive assessor kills trials when `sum(metrics) % 11 == 1`, and print killed trials to `assessor_result.txt`.
+* When tuner and assessor exit with exception, they will append `ERROR` to corresponding result file.
+* When the experiment is done, meaning it is successfully done in this case, `Experiment done` can be detected in the nni_manager.log file.
+
+## Issues
+* Private APIs are used to detect whether tuner and assessor have terminated successfully. 
+* The output of REST server is not tested.
+* Remote machine training service is not tested.
diff --git a/test/naive/expected_assessor_result.txt b/test/naive/expected_assessor_result.txt
@@ -4,4 +4,3 @@
 5 3
 7 2
 8 3
-DONE
diff --git a/test/naive/expected_tuner_result.txt b/test/naive/expected_tuner_result.txt
@@ -2,4 +2,3 @@
 6 60466176
 9 3486784401
 10 10000000000
-DONE
diff --git a/test/naive/naive_assessor.py b/test/naive/naive_assessor.py
@@ -1,10 +1,13 @@
 import logging
+import os
 
 from nni.assessor import Assessor, AssessResult
 
 _logger = logging.getLogger('NaiveAssessor')
 _logger.info('start')
-_result = open('/tmp/nni_assessor_result.txt', 'w')
+
+_pwd = os.path.dirname(__file__)
+_result = open(os.path.join(_pwd, 'assessor_result.txt'), 'w')
 
 class NaiveAssessor(Assessor):
     def __init__(self, optimize_mode):
@@ -30,7 +33,6 @@ def assess_trial(self, trial_job_id, trial_history):
         return AssessResult.Good
 
     def _on_exit(self):
-        _result.write('DONE\n')
         _result.close()
 
     def _on_error(self):

diff --git a/test/naive/naive_tuner.py b/test/naive/naive_tuner.py
@@ -1,11 +1,14 @@
 import json
 import logging
+import os
 
 from nni.tuner import Tuner
 
 _logger = logging.getLogger('NaiveTuner')
 _logger.info('start')
-_result = open('/tmp/nni_tuner_result.txt', 'w')
+
+_pwd = os.path.dirname(__file__)
+_result = open(os.path.join(_pwd, 'tuner_result.txt'), 'w')
 
 class NaiveTuner(Tuner):
     def __init__(self, optimize_mode):
@@ -24,11 +27,10 @@ def receive_trial_result(self, parameter_id, parameters, reward):
 
     def update_search_space(self, search_space):
         _logger.info('update_search_space: %s' % search_space)
-        with open('/tmp/nni_tuner_search_space.json', 'w') as file_:
+        with open(os.path.join(_pwd, 'tuner_search_space.json'), 'w') as file_:
             json.dump(search_space, file_)
 
     def _on_exit(self):
-        _result.write('DONE\n')
         _result.close()
 
     def _on_error(self):