Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Refactor nnictl and add config_pai.yml #144

Merged
merged 34 commits into from
Sep 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
dc780cd
Merge pull request #1 from Microsoft/master
SparkSnail Sep 14, 2018
86243e7
Merge pull request #2 from Microsoft/master
SparkSnail Sep 14, 2018
3d1e4e9
fix nnictl bug
Sep 14, 2018
6d09780
Merge pull request #4 from Microsoft/master
SparkSnail Sep 17, 2018
0d24158
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 18, 2018
6d669c6
Merge pull request #6 from Microsoft/master
SparkSnail Sep 19, 2018
af2615d
Merge pull request #8 from Microsoft/master
SparkSnail Sep 20, 2018
f6b7c0a
Merge pull request #9 from Microsoft/master
SparkSnail Sep 24, 2018
a74febc
Merge pull request #10 from Microsoft/master
SparkSnail Sep 25, 2018
334b0a4
Merge pull request #12 from Microsoft/master
SparkSnail Sep 27, 2018
efe93df
Merge pull request #13 from Microsoft/master
SparkSnail Sep 27, 2018
0d9b074
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 28, 2018
863d137
add hdfs host validation
Sep 28, 2018
7c4bf9b
fix bugs
Sep 28, 2018
61e7f86
Merge pull request #14 from Microsoft/v0.2
SparkSnail Sep 28, 2018
3dfce3a
fix dockerfile
Sep 28, 2018
30f8feb
fix install.sh
Sep 29, 2018
0926045
update install.sh
Sep 29, 2018
c3160e4
fix dockerfile
Sep 29, 2018
d3f68be
Set timeout for HDFSUtility exists function
Sep 29, 2018
4959a93
remove unused TODO
Sep 29, 2018
46fad2a
Merge branch 'v0.2' of https://github.com/SparkSnail/nni into v0.2
Sep 29, 2018
b1ae562
fix sdk
Sep 29, 2018
7c571b2
add optional for outputDir and dataDir
Sep 29, 2018
9744692
refactor dockerfile.base
Sep 29, 2018
2fe73a6
Remove unused import in hdfsclientUtility
Sep 29, 2018
26e8864
Merge pull request #15 from Microsoft/v0.2
SparkSnail Sep 30, 2018
f5fdeab
add config_pai.yml
Sep 30, 2018
c5373f0
refactor nnictl create logic and add colorful print
Sep 30, 2018
349c0a7
fix nnictl stop logic
Sep 30, 2018
64e8d7d
add annotation for config_pai.yml
Sep 30, 2018
f8c8ddb
add document for start experiment
Sep 30, 2018
60eae23
fix config.yml
Sep 30, 2018
8bb958c
fix document
Sep 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/StartExperiment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
How to start an experiment
===
## 1.Introduce
There are few steps to start an new experiment of nni, here are the process.
<img src="./img/experiment_process.jpg" width="50%" height="50%" />
## 2.Details
### 2.1 Check environment
The first step to start an experiment is to check whether the environment is ready, nnictl will check if there is an old experiment running or the port of restfurl server is occupied.
NNICTL will also validate the content of config yaml file, to ensure the experiment config is in correct format.

### 2.2 Start restful server
After check environment, nnictl will start an restful server process to manage nni experiment, the devault port is 51188.

### 2.3 Check restful server
Before next steps, nnictl will check whether restful server is successfully started, or the starting process will stop and show error message.

### 2.4 Set experiment config
NNICTL need to set experiment config before start an experiment, experiment config includes the config values in config yaml file.

### 2.5 Check experiment cofig
NNICTL will ensure the request to set config is successfully executed.

### 2.6 Start Web UI
NNICTL will start a Web UI process to show Web UI information,the default port of Web UI is 8080.

### 2.7 Check Web UI
If Web UI is not successfully started, nnictl will give a warning information, and will continue to start experiment.

### 2.8 Start Experiment
This is the most import step of starting an nni experiment, nnictl will call restful server process to setup an experiment.

### 2.9 Check experiment
After start experiment, nnictl will check whether the experiment is correctly created, and show more information of this experiment to users.
Binary file added docs/img/experiment_process.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/trials/auto-gbdt/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_auto-gbdt
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
Expand Down
36 changes: 36 additions & 0 deletions examples/trials/auto-gbdt/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
authorName: default
experimentName: example_auto-gbdt
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution,
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: minimize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
SparkSnail marked this conversation as resolved.
Show resolved Hide resolved
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/ga_squad/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_ga_squad
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: false
Expand Down
34 changes: 34 additions & 0 deletions examples/trials/ga_squad/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
authorName: default
experimentName: example_ga_squad
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
#choice: true, false
useAnnotation: false
tuner:
codeDir: ../tuners/ga_customer_tuner
classFileName: customer_tuner.py
className: CustomerTuner
classArgs:
optimize_mode: maximize
trial:
command: python3 trial.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/mnist-annotation/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
Expand Down
35 changes: 35 additions & 0 deletions examples/trials/mnist-annotation/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution,
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/mnist-batch-tune-keras/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_mnist-keras
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
Expand Down
36 changes: 36 additions & 0 deletions examples/trials/mnist-batch-tune-keras/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
authorName: default
experimentName: example_mnist-keras
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: BatchTuner
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist-keras.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/mnist-keras/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_mnist-keras
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
Expand Down
36 changes: 36 additions & 0 deletions examples/trials/mnist-keras/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
authorName: default
experimentName: example_mnist-keras
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution,
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist-keras.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/mnist-smartparam/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_mnist-smartparam
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
Expand Down
35 changes: 35 additions & 0 deletions examples/trials/mnist-smartparam/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
authorName: default
experimentName: example_mnist-smartparam
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution,
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/mnist/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
Expand Down
36 changes: 36 additions & 0 deletions examples/trials/mnist/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution,
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
hdfsDataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
hdfsOutputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
2 changes: 1 addition & 1 deletion examples/trials/pytorch_cifar10/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ experimentName: example_pytorch_cifar10
trialConcurrency: 1
maxExecDuration: 100h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
Expand Down
Loading