This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update dev-weight-sharing to latest master (#391)
* Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * Add sequenceId to TrialJobInfo (#283) * Show error information and fix paramiko installation (#282) * fix paramiko install * Refactor pip installation logic for supporting uninstall * Update documents due to new pip installation approach * Refactor Makefile for consistent with pip installation approach * Add README for building and uploading NNI package * Fix issues for pip installation * Minor fix on #41 (#280) * Typo on #12 (#281) * plus minor proposals * Quick fix resume logic (#285) * Tgs salt example (#286) * TGS salt example * updates * updates * Hide install via pip prompt, since 0.3 has not been published (#287) * move reward extraction logic to tuner (#274) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * refactor extract reward from dict by tuner * Update README.md (#288) added License badge * fix doc mistakes and broken links. (#271) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * Update README.md * Fix misspelling in examples/trials/ga_squad/README.md * Update WebUI (#295) * Update README.md (#296) * Fix nnictl in master (#300) 1.Fix old version of config file 2.fix sklearn requirements 3.Fix resume log logic * revert master's installation doc to install v0.2 before v0.3 official release. (#298) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * update doc * fix links * fix links * fix links * merge * fix links and doc errors * merge * merge * merge * merge * Quick fix nnictl config logic (#289) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * merge * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * revise the installation cmd to v0.2 * revise to install v0.2 * Update nnictl_utils.py * Update nnictl_utils.py * Update nnictl_utils.py * Rename the pypi package for nni * Merge 0.3 into master (#313) * Quick fix nnictl config logic (#289) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * PR merge to 0.3 (#297) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * update doc * fix links * fix links * fix links * merge * fix links and doc errors * merge * merge * merge * merge * Update README.md (#288) added License badge * merge * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * fix doc mistakes and broken links. (#271) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * Update README.md * Fix misspelling in examples/trials/ga_squad/README.md * revise the installation cmd to v0.2 * revise to install v0.2 * remove enas readme (#292) * Fix datastore performance issue (#301) * Fix nnictl in v0.3 (#299) Fix old version of config file fix sklearn requirements Fix resume log logic * remove paramiko in V0.3 (#306) remove paramiko in V0.3 * Release note 0.3 (#303) * v0.3 release notes * updates * updates * updates * updates * updates * updates * Inform users to set experiment id when id is empty (#310) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * fix nnictl crash bug * add requirement.txt for sklearn example * fix nnictl configuration bug * update * update * update * update * remove paramiko * refactor nnictl lfor log stdout * update * updaate * fix endtime when resume (#307) * fix endtime when resume * update * update * update * updates * Fix sequence id issue on resuming experiment (#316) * Fix bugs for v0.3 (#315) * Fix bugs * update * Refactor document of nnictl for v0.3 (#314) * fix nnictl document * Fix bug of default metric for v0.3 (#304) * Fix bug of Default Metric and modifiy trials detail style * update * Document updates for v0.3 (#318) * refactor doc * update with Mao's suggestions * Set theme jekyll-theme-dinky * update doc * fix links * fix links * fix links * merge * fix links and doc errors * merge * merge * merge * merge * Quick fix nnictl config logic (#289) * fix nnictl bug * fix install.sh * add desc for Dockerfile.build.base * update document for Dockerfile * update * refactor port detect * update * refactor NNICTLDOC.md * add document for pai and nnictl * add default value for port * add exception handling in trial_keeper.py * fix port bug * fix resume * fix nnictl resume and fix nnictl stop * fix document * update * refactor nnictl * update * update doc * update * update nnictl * fix comment * revert dockerfile * update * update * update * fix nnictl error hit * fix comments * fix bash-completion * fix paramiko install * quick fix resume logic * update * quick fix nnictl * merge * updated the "Contribute" part (merged Gems' wiki in, updated ReadMe) * fix link * revise the installation cmd to v0.2 * revise to install v0.2 * Update nnictl_utils.py * Update nnictl_utils.py * Update nnictl_utils.py * Update documentation for v0.3 * update (#320) * update * fix ga_squad config * Refactor close experiment implementation * Uniform the names of python modules * Update WebUI docs (#325) * add webui screenshot in README.md (#323) * update doc * update * update * update * add logo * logo * logo * update * update installation doc * Update v0.3.0 release note (#324) update v0.3.0 release note * update doc for v0.3.3 installation tag (#329) as the title * Merge v0.3 into master (#337) * Fix pypi package missing python module * Fix pypi package missing python module * fix bug in smartparam example (#322) * Fix nnictl update trialnum and document (#326) 1.Fix restful server of update 2.Update nnictl document of update 3.Add tensorboard in docement * Update the version numbers from 0.3.2 to 0.3.3 * Fix contributing doc problems (#335) broken link wrong step refine wording * Add download button (#332) * Merge v0.3 to master (#339) * Fix pypi package missing python module * Fix pypi package missing python module * fix bug in smartparam example (#322) * Fix nnictl update trialnum and document (#326) 1.Fix restful server of update 2.Update nnictl document of update 3.Add tensorboard in docement * Update the version numbers from 0.3.2 to 0.3.3 * Update examples (#331) * update mnist-annotation * fix mnist-annotation typo * update mnist example * update mnist-smartparam * update mnist-annotation * update mnist-smartparam * change learning rate * update mnist assessor maxTrialNum * update examples * update examples * update maxTrialNum * fix breaking path in config_assessor.yml * Add error message when experiment's status is error (#338) * Add error message when experiment error * delete unuseful code * Fix bug of bgcolor when experiment status is running * Change base image from devel to runtime, to reduce docker image size (#343) * Update version number since v0.3.4 has been released (#342) * Fixed the issue that pip install --user doesn't work in docker as root user * Fix localTrainingService cancel logic and nnictl logic (#334) Fix nnictl stop logic Fix localTrainingService cancelJob logic Show port information in "nnictl experiment list" cmd. Show more information when config file validate failed. Add nnictl detect adjacent port logic if the platform is pai * update doc about nni docker image (#345) * Merge v0.3 2 (#352) * Fix pypi package missing python module * Fix pypi package missing python module * fix bug in smartparam example (#322) * Fix nnictl update trialnum and document (#326) 1.Fix restful server of update 2.Update nnictl document of update 3.Add tensorboard in docement * Update the version numbers from 0.3.2 to 0.3.3 * Update examples (#331) * update mnist-annotation * fix mnist-annotation typo * update mnist example * update mnist-smartparam * update mnist-annotation * update mnist-smartparam * change learning rate * update mnist assessor maxTrialNum * update examples * update examples * update maxTrialNum * fix breaking path in config_assessor.yml * fix bug in nnimanager (#341) * Update nnictl.py (#347) * Update nnictl.py * modify help message for nnictl stop * update doc for docker image (#353) * update doc for docker image * update * [PAI training service] Support running multiple PAI experiment (#348) * Change base image from devel to runtime, to reduce docker image size * Support running multiple experiment for PAI * Fix a bug regarding to recuisively reference between paiRestServer and paiTrainingService * update makefile (#350) * update makefile * update launcher.py to fix the problem of finding main.js * remove duplicated lib * update local demo doc and configuration (#344) * update local demo doc and configuration * change folder name * Update tutorial_1_CR_exp_local_api.md no need to have a new training file * Delete mnist_gpu.py no need to have a new training file * Update config_gpu.yml no need to have a new training file * add PyTorch to Dockerfile (#362) * update local demo doc and configuration * change folder name * Update tutorial_1_CR_exp_local_api.md no need to have a new training file * Delete mnist_gpu.py no need to have a new training file * Update config_gpu.yml no need to have a new training file * add PyTorch to Dockerfile * Add Pytorch and set sklearn version in Dockerfile (#346) 1.Set scikit-learn==0.20.0 in Dockerfile 2.Update readme.md of dockerile 3.Add PyTorch 0.4.1 4.Add description for 'nnictl stop all' * Quick fix Docker (#363) Remove "RUN python3 -m pip --no-cache-dir install torch torchvision" * Updated document for "write a trial" related fixes. (#351) - Updated document for "write a trial" related fixes per Quanlu's feedback; - Fix wrong links in Get started per Meng's feedback. * Fix the issue#211: WebUI does not support search for a specific Trial (#355) * Fix the issue#211: WebUI does not support search for a specific Trial * delete unuseful code * Update * default 20 * add more details for remote mode docs (#366) * add more details for remote mode docs (#365) * update tutorial for remote machine as well (#367) * Support hyper-band (#358) * add gridsearch tuner (#364) * add gridsearch tuner * add gridsearchtuner * add gridsearchtuner * add gridsearchtuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch tuner * update gridsearch and pylint * Fix nni stop (#368) Fix "nnictl stop" * Add more tooltips in default metric graph (#370) * Add more tooltip in default metric graph and fix bug * update * Update README.md (#371) * [Kubeflow Training Service] V1, merge from kubeflow branch to master branch (#382) * Kubeflow TrainingService support, v1 (#373) 1. Create new Training Service: kubeflow trainning service, use 'kubectl' and kubeflow tfjobs CRD to submit and manage jobs 2. Update nni python SDK to support new kubeflow platform 3. Update nni python SDK's get_sequende_id() implementation, read NNI_TRIAL_SEQ_ID env variable, instead of reading .nni/sequence_id file 4. This version only supports Tensorflow operator. Will add more operators' support in future versions * Add Gitter badge (#376) * Update ci with new built-in tuner and assessor (#359) * fix sdk's unittest and add medianstop, batchtuner to ci * fix sdk's unittest and add medianstop, batchtuner to ci * remove debug info * update azure-pipelines * remove useless code * add some checks * fix pylint * update ci test * update ci * Show intermediate result (#384) * Asynchronous dispatcher (#372) * Asynchronous dispatcher * updates * updates * updates * updates * [Kubeflow training service] Update kubeflow exp job config schema to support distributed training (#387) * Support distributed training on tf-operator, for worker and ps * Update validation rule for kubeflow config * small code refactor adjustment for private methods * Use different output folder for ps and worker * add gpuNum check for local TS (#378) * add gpuNum check for local TS * set CUDA_VISIBLE_DEVICES to empty string when gpuNum is 0 * remove redundency code * [Kubeflow Training Service] Explicitly set cuda_visible_devices env var (#388) * Use different output folder for ps and worker * Add cuda_visible_devices env var if gpuNum is 0
- Loading branch information