This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Refactor nnictl and add config_pai.yml #144
Merged
Merged
Changes from 33 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
dc780cd
Merge pull request #1 from Microsoft/master
SparkSnail 86243e7
Merge pull request #2 from Microsoft/master
SparkSnail 3d1e4e9
fix nnictl bug
6d09780
Merge pull request #4 from Microsoft/master
SparkSnail 0d24158
Merge branch 'master' of https://github.com/SparkSnail/nni
6d669c6
Merge pull request #6 from Microsoft/master
SparkSnail af2615d
Merge pull request #8 from Microsoft/master
SparkSnail f6b7c0a
Merge pull request #9 from Microsoft/master
SparkSnail a74febc
Merge pull request #10 from Microsoft/master
SparkSnail 334b0a4
Merge pull request #12 from Microsoft/master
SparkSnail efe93df
Merge pull request #13 from Microsoft/master
SparkSnail 0d9b074
Merge branch 'master' of https://github.com/SparkSnail/nni
863d137
add hdfs host validation
7c4bf9b
fix bugs
61e7f86
Merge pull request #14 from Microsoft/v0.2
SparkSnail 3dfce3a
fix dockerfile
30f8feb
fix install.sh
0926045
update install.sh
c3160e4
fix dockerfile
d3f68be
Set timeout for HDFSUtility exists function
4959a93
remove unused TODO
46fad2a
Merge branch 'v0.2' of https://github.com/SparkSnail/nni into v0.2
b1ae562
fix sdk
7c571b2
add optional for outputDir and dataDir
9744692
refactor dockerfile.base
2fe73a6
Remove unused import in hdfsclientUtility
26e8864
Merge pull request #15 from Microsoft/v0.2
SparkSnail f5fdeab
add config_pai.yml
c5373f0
refactor nnictl create logic and add colorful print
349c0a7
fix nnictl stop logic
64e8d7d
add annotation for config_pai.yml
f8c8ddb
add document for start experiment
60eae23
fix config.yml
8bb958c
fix document
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
How to start an experiment | ||
=== | ||
## 1.Introduce | ||
There are few steps to start an new experiment of nni, here are the process. | ||
<img src="./img/experiment_process.jpg" width="50%" height="50%" /> | ||
## 2.Details | ||
### 2.1 Check environment | ||
The first step to start an experiment is to check whether the environment is ready, nnictl will check if there is an old experiment running or the port of restfurl server is occupied. | ||
NNICTL will also validate the content of config yaml file, to ensure the experiment config is in correct format. | ||
|
||
### 2.2 Check restful server | ||
After check environment, nnictl will start an restful server process to manage nni experiment, the devault port is 51188. | ||
|
||
### 2.3 Check restful server | ||
Before next steps, nnictl will check whether restful server is successfully started, or the starting process will stop and show error message. | ||
|
||
### 2.4 Set experiment config | ||
NNICTL need to set experiment config before start an experiment, experiment config includes the config values in config yaml file. | ||
|
||
### 2.5 Check experiment cofig | ||
NNICTL will ensure the request to set config is successfully executed. | ||
|
||
### 2.6 Start Web UI | ||
NNICTL will start a Web UI process to show Web UI information,the default port of Web UI is 8080. | ||
|
||
### 2.7 Check Web UI | ||
If Web UI is not successfully started, nnictl will give a warning information, and will continue to start experiment. | ||
|
||
### 2.8 Start Experiment | ||
This is the most import step of starting an nni experiment, nnictl will call restful server process to setup an experiment. | ||
|
||
### 2.9 Check experiment | ||
After start experiment, nnictl will check whether the experiment is correctly created, and show more information of this experiment to users. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
authorName: default | ||
experimentName: example_auto-gbdt | ||
trialConcurrency: 1 | ||
maxExecDuration: 10h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: minimize | ||
trial: | ||
command: python3 main.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
SparkSnail marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
authorName: default | ||
experimentName: example_ga_squad | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
codeDir: ../tuners/ga_customer_tuner | ||
classFileName: customer_tuner.py | ||
className: CustomerTuner | ||
classArgs: | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 trial.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
authorName: default | ||
experimentName: example_mnist | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
#choice: true, false | ||
useAnnotation: true | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
authorName: default | ||
experimentName: example_mnist-keras | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: BatchTuner | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist-keras.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
authorName: default | ||
experimentName: example_mnist-keras | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist-keras.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
authorName: default | ||
experimentName: example_mnist-smartparam | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
#choice: true, false | ||
useAnnotation: true | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
authorName: default | ||
experimentName: example_mnist | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start