From ae4b37298e6de177451ef0b695c08f1859a3900d Mon Sep 17 00:00:00 2001 From: J-shang Date: Mon, 23 Aug 2021 09:22:01 +0800 Subject: [PATCH 1/4] temp sync --- docs/en_US/reference.rst | 1 + .../reference/experiment_config_copy.rst | 325 ++++++++++++++++++ 2 files changed, 326 insertions(+) create mode 100644 docs/en_US/reference/experiment_config_copy.rst diff --git a/docs/en_US/reference.rst b/docs/en_US/reference.rst index 70d410ed2c..c9ab728970 100644 --- a/docs/en_US/reference.rst +++ b/docs/en_US/reference.rst @@ -6,6 +6,7 @@ References nnictl Commands Experiment Configuration + Experiment Configuration (test) Experiment Configuration (legacy) Search Space NNI Annotation diff --git a/docs/en_US/reference/experiment_config_copy.rst b/docs/en_US/reference/experiment_config_copy.rst new file mode 100644 index 0000000000..12019ee265 --- /dev/null +++ b/docs/en_US/reference/experiment_config_copy.rst @@ -0,0 +1,325 @@ +=========================== +Experiment Config Reference +=========================== + +A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples. + +.. Note:: + + 1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``. + + 2. In this document, the type of fields are formatted as `Python type hint `_. Therefore JSON objects are called `dict` and arrays are called `list`. + + .. _path: + + 3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory. + + - When written in the YAML file, relative paths are relative to the directory containing that file. + - When assigned in Python code, relative paths are relative to the current working directory. + - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file. + + 4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field. + +.. contents:: Contents + :local: + :depth: 3 + + +Examples +======== + +Local Mode +^^^^^^^^^^ + +.. code-block:: yaml + + experimentName: MNIST + searchSpaceFile: search_space.json + trialCommand: python mnist.py + trialCodeDirectory: . + trialGpuNumber: 1 + trialConcurrency: 2 + maxExperimentDuration: 24h + maxTrialNumber: 100 + tuner: + name: TPE + classArgs: + optimize_mode: maximize + trainingService: + platform: local + useActiveGpu: True + +Local Mode (Inline Search Space) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: yaml + + searchSpace: + batch_size: + _type: choice + _value: [16, 32, 64] + learning_rate: + _type: loguniform + _value: [0.0001, 0.1] + trialCommand: python mnist.py + trialGpuNumber: 1 + trialConcurrency: 2 + tuner: + name: TPE + classArgs: + optimize_mode: maximize + trainingService: + platform: local + useActiveGpu: True + +Remote Mode +^^^^^^^^^^^ + +.. code-block:: yaml + + experimentName: MNIST + searchSpaceFile: search_space.json + trialCommand: python mnist.py + trialCodeDirectory: . + trialGpuNumber: 1 + trialConcurrency: 2 + maxExperimentDuration: 24h + maxTrialNumber: 100 + tuner: + name: TPE + classArgs: + optimize_mode: maximize + trainingService: + platform: remote + machineList: + - host: 11.22.33.44 + user: alice + password: xxxxx + - host: my.domain.com + user: bob + sshKeyFile: ~/.ssh/id_rsa + +Reference +========= + +ExperimentConfig +^^^^^^^^^^^^^^^^ + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - experimentName + - ``Optional[str]`` + - Mnemonic name of the experiment, which will be shown in WebUI and nnictl. + + * - searchSpaceFile + - ``Optional[str]`` + - Path_ to the JSON file containing the search space. + Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. + Mutually exclusive to `searchSpace`_. + + * - searchSpace + - ``Optional[JSON]`` + - Search space object. + The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. + Note that ``None`` means "no such field" so empty search space should be written as ``{}``. + Mutually exclusive to `searchSpaceFile`_. + + * - trialCommand + - ``str`` + - Command to launch trial. + The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. + Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows. + + * - trialCodeDirectory + - ``str`` + - `Path`_ to the directory containing trial source files. + default: ``"."``. + All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file. + (See :ref:`nniignore ` for details.) + + * - trialConcurrency + - ``int`` + - Specify how many trials should be run concurrently. + The real concurrency also depends on hardware resources and may be less than this value. + + * - trialGpuNumber + - ``Optional[int]`` + - This field might have slightly different meanings for various training services, + especially when set to ``0`` or ``None``. + See `training service's document <../training_services.rst>`__ for details. + + In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). + And when set to ``None``, trials will be created and scheduled as if they did not use GPU, + but they can still use all GPU resources if they want. + + * - maxExperimentDuration + - ``Optional[str]`` + - Limit the duration of this experiment if specified. + format: ``number + s|m|h|d`` + examples: ``"10m"``, ``"0.5h"`` + When time runs out, the experiment will stop creating trials but continue to serve WebUI. + + * - maxTrialNumber + - ``Optional[int]`` + - Limit the number of trials to create if specified. + When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. + + * - maxTrialDuration + - ``Optional[str]`` + - Limit the duration of trial job if specified. + format: ``number + s|m|h|d`` + examples: ``"10m"``, ``"0.5h"`` + When time runs out, the current trial job will stop. + + * - nniManagerIp + - ``Optional[str]`` + - IP of the current machine, used by training machines to access NNI manager. Not used in local mode. + If not specified, IPv4 address of ``eth0`` will be used. + Except for the local mode, it is highly recommended to set this field manually. + + * - useAnnotation + - ``bool`` + - Enable `annotation <../Tutorial/AnnotationSpec.rst>`__. + default: ``False``. + When using annotation, `searchSpace`_ and `searchSpaceFile`_ should not be specified manually. + + * - debug + - ``bool`` + - Enable debug mode. + default: ``False`` + When enabled, logging will be more verbose and some internal validation will be loosened. + + * - logLevel + - ``Optional[str]`` + - Set log level of the whole system. + values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` + Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". + Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. + The exception is trial, whose logging level is directly managed by trial code. + For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``. + + * - experimentWorkingDirectory + - ``Optional[str]`` + - Specify the :ref:`directory ` to place log, checkpoint, metadata, and other run-time stuff. + By default uses ``~/nni-experiments``. + NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments. + + * - tunerGpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to tuner, assessor, and advisor. + This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process. + Because tuner, assessor, and advisor run in the same process, this option will affect them all. + + * - tuner + - ``Optional[AlgorithmConfig]`` + - Specify the tuner. + The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner. + + * - assessor + - ``Optional[AlgorithmConfig]`` + - Specify the assessor. + The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor. + + * - advisor + - ``Optional[AlgorithmConfig]`` + - Specify the advisor. + NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor. + + * - trainingService + - ``TrainingServiceConfig`` + - Specify the `training service <../TrainingService/Overview.rst>`__. + + * - sharedStorage + - ``Optional[SharedStorageConfig]`` + - Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. + +AlgorithmConfig +^^^^^^^^^^^^^^^ + +``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm. + +For customized algorithms, there are two ways to describe them: + + 1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred) + + 2. Specify code directory and class name directly. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - name + - ``Optional[str]`` + - Name of the built-in or registered algorithm. + ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms. + + * - className + - ``Optional[str]`` + - Qualified class name of not registered customized algorithm. + ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. + example: ``"my_tuner.MyTuner"`` + + * - codeDirectory + - ``Optional[str]`` + - `Path`_ to the directory containing the customized algorithm class. + ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. + + * - classArgs + - ``Optional[dict[str, Any]]`` + - Keyword arguments passed to algorithm class' constructor. + See algorithm's document for supported value. + +TrainingServiceConfig +^^^^^^^^^^^^^^^^^^^^^ + +One of the following: + +- `LocalConfig`_ +- `RemoteConfig`_ +- :ref:`OpenpaiConfig ` +- `AmlConfig`_ +- `HybridConfig`_ + +For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. + +LocalConfig +----------- + +Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - Constant string ``"local"`` + - + + * - useActiveGpu + - ``Optional[bool]`` + - Specify whether NNI should submit trials to GPUs occupied by other tasks. + Must be set when `trialGpuNumber`_ greater than zero. + Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server + + If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. + From 343741f1bf3c0effd63e34c1def7eb2d3aace377 Mon Sep 17 00:00:00 2001 From: J-shang Date: Tue, 31 Aug 2021 18:03:27 +0800 Subject: [PATCH 2/4] update doc --- .../reference/experiment_config_copy.rst | 395 +++++++++++++++++- 1 file changed, 388 insertions(+), 7 deletions(-) diff --git a/docs/en_US/reference/experiment_config_copy.rst b/docs/en_US/reference/experiment_config_copy.rst index 12019ee265..33d32c0995 100644 --- a/docs/en_US/reference/experiment_config_copy.rst +++ b/docs/en_US/reference/experiment_config_copy.rst @@ -121,14 +121,14 @@ ExperimentConfig - ``Optional[str]`` - Path_ to the JSON file containing the search space. Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. - Mutually exclusive to `searchSpace`_. + Mutually exclusive to ``searchSpace``. * - searchSpace - ``Optional[JSON]`` - Search space object. The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. Note that ``None`` means "no such field" so empty search space should be written as ``{}``. - Mutually exclusive to `searchSpaceFile`_. + Mutually exclusive to ``searchSpaceFile``. * - trialCommand - ``str`` @@ -187,7 +187,7 @@ ExperimentConfig - ``bool`` - Enable `annotation <../Tutorial/AnnotationSpec.rst>`__. default: ``False``. - When using annotation, `searchSpace`_ and `searchSpaceFile`_ should not be specified manually. + When using annotation, ``searchSpace`` and ``searchSpaceFile`` should not be specified manually. * - debug - ``bool`` @@ -199,7 +199,7 @@ ExperimentConfig - ``Optional[str]`` - Set log level of the whole system. values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` - Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". + Defaults to "info" or "debug", depending on ``debug`` option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. The exception is trial, whose logging level is directly managed by trial code. For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``. @@ -286,8 +286,9 @@ One of the following: - `LocalConfig`_ - `RemoteConfig`_ -- :ref:`OpenpaiConfig ` +- `OpenpaiConfig`_ - `AmlConfig`_ +- `DlcConfig`_ - `HybridConfig`_ For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. @@ -306,13 +307,13 @@ Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. - Description * - platform - - Constant string ``"local"`` + - ``"local"`` - * - useActiveGpu - ``Optional[bool]`` - Specify whether NNI should submit trials to GPUs occupied by other tasks. - Must be set when `trialGpuNumber`_ greater than zero. + Must be set when ``trialGpuNumber`` greater than zero. Following processes can make GPU "active": - non-NNI CUDA programs @@ -323,3 +324,383 @@ Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. + * - maxTrialNumberPerGpu + - ``int`` + - Specify how many trials can share one GPU. + default: ``1`` + + * - gpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to trial processes. + If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. + This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. + +RemoteConfig +------------ + +Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"remote"`` + - + + * - machineList + - ``List[RemoteMachineConfig]`` + - List of training machines. + + * - reuseMode + - ``bool`` + - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. + +RemoteMachineConfig +""""""""""""""""""" + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - host + - ``str`` + - IP or hostname (domain name) of the machine. + + * - port + - ``int`` + - SSH service port. + default: ``22`` + + * - user + - ``str`` + - Login user name. + + * - password + - ``Optional[str]`` + - If not specified, ``sshKeyFile`` will be used instead. + + * - sshKeyFile + - ``Optional[str]`` + - `Path`_ to ``sshKeyFile`` (identity file). + Only used when `password`_ is not specified. + + * - sshPassphrase + - ``Optional[str]`` + - Passphrase of SSH identity file. + + * - useActiveGpu + - ``bool`` + - Specify whether NNI should submit trials to GPUs occupied by other tasks. + default: ``False`` + Must be set when ``trialGpuNumber`` greater than zero. + Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server + + If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. + + * - maxTrialNumberPerGpu + - ``int`` + - Specify how many trials can share one GPU. + default: ``1`` + + * - gpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to trial processes. + If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. + This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. + + * - pythonPath + - ``Optional[str]`` + - Specify a Python environment. + This path will be inserted at the front of PATH. Here are some examples: + + - (linux) pythonPath: ``/opt/python3.7/bin`` + - (windows) pythonPath: ``C:/Python37`` + + If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below: + + - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/`` + - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin`` + + This is useful if preparing steps vary for different machines. + +OpenpaiConfig +------------- + +Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"openpai"`` + - + + * - host + - ``str`` + - Hostname of OpenPAI service. + This may include ``https://`` or ``http://`` prefix. + HTTPS will be used by default. + + * - username + - ``str`` + - OpenPAI user name. + + * - token + - ``str`` + - OpenPAI user token. + This can be found in your OpenPAI user settings page. + + * - trialCpuNumber + - ``int`` + - Specify the CPU number of each trial to be used in OpenPAI container. + + * - trialMemorySize + - ``str`` + - Specify the memory size of each trial to be used in OpenPAI container. + format: ``number + tb|gb|mb|kb``. + examples: ``"8gb"``, ``"8192mb"``. + + * - storageConfigName + - ``str`` + - Specify the storage name used in OpenPAI. + + * - dockerImage + - ``str`` + - Name and tag of docker image to run the trials. + default: ``"msranni/nni:latest"``. + + * - localStorageMountPoint + - ``str`` + - :ref:`Mount point ` of storage service (typically NFS) on the local machine. + + * - containerStorageMountPoint + - ``str`` + - Mount point of storage service (typically NFS) in docker container. + This must be an absolute path. + + * - reuseMode + - ``bool`` + - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. + default: ``False``. + + * - openpaiConfig + - ``Optional[JSON]`` + - Embedded OpenPAI config file. + + * - openpaiConfigFile + - ``Optional[str]`` + - `Path`_ to OpenPAI config file. + An example can be found `here `__. + +AmlConfig +--------- + +Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"aml"`` + - + + * - dockerImage + - ``str`` + - Name and tag of docker image to run the trials. + default: ``"msranni/nni:latest"`` + + * - subscriptionId + - ``str`` + - Azure subscription ID. + + * - resourceGroup + - ``str`` + - Azure resource group name. + + * - workspaceName + - ``str`` + - Azure workspace name. + + * - computeTarget + - ``str`` + - AML compute cluster name. + +DlcConfig +--------- + +Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__. + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"dlc"`` + - + + * - type + - ``str`` + - Job spec type. + default: ``"worker"``. + + * - image + - ``str`` + - Name and tag of docker image to run the trials. + + * - jobType + - ``str`` + - PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``. + + * - podCount + - ``str`` + - Pod count to run a single training job. + + * - ecsSpec + - ``str`` + - Training server config spec string. + + * - region + - ``str`` + - The region where PAI-DLC public-cluster locates. + + * - nasDataSourceId + - ``str`` + - The NAS datasource id configurated in PAI-DLC side. + + * - accessKeyId + - ``str`` + - The accessKeyId of your cloud account. + + * - accessKeySecret + - ``str`` + - The accessKeySecret of your cloud account. + + * - localStorageMountPoint + - ``str`` + - The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/. + + * - containerStorageMountPoint + - ``str`` + - The mount point of the NAS on PAI-DLC side, default is /root/data/. + +HybridConfig +------------ + +Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__. + +SharedStorageConfig +^^^^^^^^^^^^^^^^^^^ + +Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. + +nfsConfig +--------- + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - storageType + - ``"NFS"`` + - + + * - localMountPoint + - ``str`` + - The path that the storage has been or will be mounted in the local machine. + If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. + + * - remoteMountPoint + - ``str`` + - The path that the storage will be mounted in the remote machine. + If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. + + * - localMounted + - ``str`` + - Specify the object and status to mount the shared storage. + values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` + ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. + + * - nfsServer + - ``str`` + - NFS server host. + + * - exportedDirectory + - ``str`` + - Exported directory of NFS server, detailed `here `_. + +azureBlobConfig +--------------- + +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - storageType + - ``"AzureBlob"`` + - + + * - localMountPoint + - ``str`` + - The path that the storage has been or will be mounted in the local machine. + If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. + + * - remoteMountPoint + - ``str`` + - The path that the storage will be mounted in the remote machine. + If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. + Note that the directory must be empty when using AzureBlob. + + * - localMounted + - ``str`` + - Specify the object and status to mount the shared storage. + values: ``"usermount"``, ``"nnimount"``, ``"nomount"``. + ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. + + * - storageAccountName + - ``str`` + - Azure storage account name. + + * - storageAccountKey + - ``Optional[str]`` + - Azure storage account key. + + * - containerName + - ``str`` + - AzureBlob container name. From 971fd0dc48c9950bfe7661d71742ee7f96fa674e Mon Sep 17 00:00:00 2001 From: J-shang Date: Tue, 31 Aug 2021 18:04:40 +0800 Subject: [PATCH 3/4] replace origin to copy --- docs/en_US/reference.rst | 1 - docs/en_US/reference/experiment_config.rst | 1307 ++++++----------- .../reference/experiment_config_copy.rst | 706 --------- 3 files changed, 486 insertions(+), 1528 deletions(-) delete mode 100644 docs/en_US/reference/experiment_config_copy.rst diff --git a/docs/en_US/reference.rst b/docs/en_US/reference.rst index c9ab728970..70d410ed2c 100644 --- a/docs/en_US/reference.rst +++ b/docs/en_US/reference.rst @@ -6,7 +6,6 @@ References nnictl Commands Experiment Configuration - Experiment Configuration (test) Experiment Configuration (legacy) Search Space NNI Annotation diff --git a/docs/en_US/reference/experiment_config.rst b/docs/en_US/reference/experiment_config.rst index 472d5c4793..33d32c0995 100644 --- a/docs/en_US/reference/experiment_config.rst +++ b/docs/en_US/reference/experiment_config.rst @@ -105,252 +105,139 @@ Reference ExperimentConfig ^^^^^^^^^^^^^^^^ -experimentName --------------- - -Mnemonic name of the experiment, which will be shown in WebUI and nnictl. - -type: ``Optional[str]`` - - -searchSpaceFile ---------------- - -Path_ to the JSON file containing the search space. - -type: ``Optional[str]`` - -Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. - -Mutually exclusive to `searchSpace`_. - - -searchSpace ------------ - -Search space object. - -type: ``Optional[JSON]`` - -The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. - -Note that ``None`` means "no such field" so empty search space should be written as ``{}``. - -Mutually exclusive to `searchSpaceFile`_. - - -trialCommand ------------- - -Command to launch trial. - -type: ``str`` - -The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. - -Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows. - - -trialCodeDirectory ------------------- - -`Path`_ to the directory containing trial source files. - -type: ``str`` - -default: ``"."`` - -All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file. -(See :ref:`nniignore ` for details.) - - -trialConcurrency ----------------- - -Specify how many trials should be run concurrently. - -type: ``int`` - -The real concurrency also depends on hardware resources and may be less than this value. - - -trialGpuNumber --------------- - -Number of GPUs used by each trial. - -type: ``Optional[int]`` - -This field might have slightly different meanings for various training services, -especially when set to ``0`` or ``None``. -See `training service's document <../training_services.rst>`__ for details. - -In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). -And when set to ``None``, trials will be created and scheduled as if they did not use GPU, -but they can still use all GPU resources if they want. - - -maxExperimentDuration ---------------------- - -Limit the duration of this experiment if specified. - -type: ``Optional[str]`` - -format: ``number + s|m|h|d`` - -examples: ``"10m"``, ``"0.5h"`` - -When time runs out, the experiment will stop creating trials but continue to serve WebUI. - - -maxTrialNumber --------------- - -Limit the number of trials to create if specified. - -type: ``Optional[int]`` - -When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. - - -maxTrialDuration ---------------------- - -Limit the duration of trial job if specified. - -type: ``Optional[str]`` - -format: ``number + s|m|h|d`` - -examples: ``"10m"``, ``"0.5h"`` - -When time runs out, the current trial job will stop. - - -nniManagerIp ------------- - -IP of the current machine, used by training machines to access NNI manager. Not used in local mode. - -type: ``Optional[str]`` - -If not specified, IPv4 address of ``eth0`` will be used. - -Except for the local mode, it is highly recommended to set this field manually. - - -useAnnotation -------------- - -Enable `annotation <../Tutorial/AnnotationSpec.rst>`__. - -type: ``bool`` - -default: ``False`` - -When using annotation, `searchSpace`_ and `searchSpaceFile`_ should not be specified manually. - - -debug ------ - -Enable debug mode. - -type: ``bool`` - -default: ``False`` - -When enabled, logging will be more verbose and some internal validation will be loosened. - - -logLevel --------- - -Set log level of the whole system. - -type: ``Optional[str]`` - -values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` - -Defaults to "info" or "debug", depending on `debug`_ option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". - -Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. - -The exception is trial, whose logging level is directly managed by trial code. - -For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``. - - -experimentWorkingDirectory --------------------------- - -Specify the :ref:`directory ` to place log, checkpoint, metadata, and other run-time stuff. - -type: ``Optional[str]`` - -By default uses ``~/nni-experiments``. - -NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments. - - -tunerGpuIndices ---------------- - -Limit the GPUs visible to tuner, assessor, and advisor. - -type: ``Optional[list[int] | str | int]`` - -This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process. - -Because tuner, assessor, and advisor run in the same process, this option will affect them all. - - -tuner ------ - -Specify the tuner. - -type: Optional `AlgorithmConfig`_ - -The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner. - - -assessor --------- - -Specify the assessor. - -type: Optional `AlgorithmConfig`_ - -The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor. - - -advisor -------- - -Specify the advisor. - -type: Optional `AlgorithmConfig`_ - -NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor. - - -trainingService ---------------- - -Specify the `training service <../TrainingService/Overview.rst>`__. - -type: `TrainingServiceConfig`_ - - -sharedStorage -------------- - -Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. - -type: Optional `SharedStorageConfig`_ - +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - experimentName + - ``Optional[str]`` + - Mnemonic name of the experiment, which will be shown in WebUI and nnictl. + + * - searchSpaceFile + - ``Optional[str]`` + - Path_ to the JSON file containing the search space. + Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. + Mutually exclusive to ``searchSpace``. + + * - searchSpace + - ``Optional[JSON]`` + - Search space object. + The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. + Note that ``None`` means "no such field" so empty search space should be written as ``{}``. + Mutually exclusive to ``searchSpaceFile``. + + * - trialCommand + - ``str`` + - Command to launch trial. + The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. + Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows. + + * - trialCodeDirectory + - ``str`` + - `Path`_ to the directory containing trial source files. + default: ``"."``. + All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file. + (See :ref:`nniignore ` for details.) + + * - trialConcurrency + - ``int`` + - Specify how many trials should be run concurrently. + The real concurrency also depends on hardware resources and may be less than this value. + + * - trialGpuNumber + - ``Optional[int]`` + - This field might have slightly different meanings for various training services, + especially when set to ``0`` or ``None``. + See `training service's document <../training_services.rst>`__ for details. + + In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). + And when set to ``None``, trials will be created and scheduled as if they did not use GPU, + but they can still use all GPU resources if they want. + + * - maxExperimentDuration + - ``Optional[str]`` + - Limit the duration of this experiment if specified. + format: ``number + s|m|h|d`` + examples: ``"10m"``, ``"0.5h"`` + When time runs out, the experiment will stop creating trials but continue to serve WebUI. + + * - maxTrialNumber + - ``Optional[int]`` + - Limit the number of trials to create if specified. + When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. + + * - maxTrialDuration + - ``Optional[str]`` + - Limit the duration of trial job if specified. + format: ``number + s|m|h|d`` + examples: ``"10m"``, ``"0.5h"`` + When time runs out, the current trial job will stop. + + * - nniManagerIp + - ``Optional[str]`` + - IP of the current machine, used by training machines to access NNI manager. Not used in local mode. + If not specified, IPv4 address of ``eth0`` will be used. + Except for the local mode, it is highly recommended to set this field manually. + + * - useAnnotation + - ``bool`` + - Enable `annotation <../Tutorial/AnnotationSpec.rst>`__. + default: ``False``. + When using annotation, ``searchSpace`` and ``searchSpaceFile`` should not be specified manually. + + * - debug + - ``bool`` + - Enable debug mode. + default: ``False`` + When enabled, logging will be more verbose and some internal validation will be loosened. + + * - logLevel + - ``Optional[str]`` + - Set log level of the whole system. + values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` + Defaults to "info" or "debug", depending on ``debug`` option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". + Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. + The exception is trial, whose logging level is directly managed by trial code. + For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``. + + * - experimentWorkingDirectory + - ``Optional[str]`` + - Specify the :ref:`directory ` to place log, checkpoint, metadata, and other run-time stuff. + By default uses ``~/nni-experiments``. + NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments. + + * - tunerGpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to tuner, assessor, and advisor. + This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process. + Because tuner, assessor, and advisor run in the same process, this option will affect them all. + + * - tuner + - ``Optional[AlgorithmConfig]`` + - Specify the tuner. + The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner. + + * - assessor + - ``Optional[AlgorithmConfig]`` + - Specify the assessor. + The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor. + + * - advisor + - ``Optional[AlgorithmConfig]`` + - Specify the advisor. + NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor. + + * - trainingService + - ``TrainingServiceConfig`` + - Specify the `training service <../TrainingService/Overview.rst>`__. + + * - sharedStorage + - ``Optional[SharedStorageConfig]`` + - Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. AlgorithmConfig ^^^^^^^^^^^^^^^ @@ -363,42 +250,34 @@ For customized algorithms, there are two ways to describe them: 2. Specify code directory and class name directly. - -name ----- - -Name of the built-in or registered algorithm. - -type: ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms. - - -className ---------- - -Qualified class name of not registered customized algorithm. - -type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. - -example: ``"my_tuner.MyTuner"`` - - -codeDirectory -------------- - -`Path`_ to the directory containing the customized algorithm class. - -type: ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. - - -classArgs ---------- - -Keyword arguments passed to algorithm class' constructor. - -type: ``Optional[dict[str, Any]]`` - -See algorithm's document for supported value. - +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - name + - ``Optional[str]`` + - Name of the built-in or registered algorithm. + ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms. + + * - className + - ``Optional[str]`` + - Qualified class name of not registered customized algorithm. + ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. + example: ``"my_tuner.MyTuner"`` + + * - codeDirectory + - ``Optional[str]`` + - `Path`_ to the directory containing the customized algorithm class. + ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. + + * - classArgs + - ``Optional[dict[str, Any]]`` + - Keyword arguments passed to algorithm class' constructor. + See algorithm's document for supported value. TrainingServiceConfig ^^^^^^^^^^^^^^^^^^^^^ @@ -407,635 +286,421 @@ One of the following: - `LocalConfig`_ - `RemoteConfig`_ -- :ref:`OpenpaiConfig ` +- `OpenpaiConfig`_ - `AmlConfig`_ - `DlcConfig`_ - `HybridConfig`_ For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. - LocalConfig ----------- Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. -platform -"""""""" - -Constant string ``"local"``. - - -useActiveGpu -"""""""""""" - -Specify whether NNI should submit trials to GPUs occupied by other tasks. - -type: ``Optional[bool]`` - -Must be set when `trialGpuNumber`_ greater than zero. - -Following processes can make GPU "active": - - - non-NNI CUDA programs - - graphical desktop - - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time - - other users' CUDA programs, if you are using a shared server - -If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. - -When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. - - -maxTrialNumberPerGpu -"""""""""""""""""""" - -Specify how many trials can share one GPU. - -type: ``int`` - -default: ``1`` - - -gpuIndices -"""""""""" - -Limit the GPUs visible to trial processes. - -type: ``Optional[list[int] | str | int]`` - -If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial. - -This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. - +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"local"`` + - + + * - useActiveGpu + - ``Optional[bool]`` + - Specify whether NNI should submit trials to GPUs occupied by other tasks. + Must be set when ``trialGpuNumber`` greater than zero. + Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server + + If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. + + * - maxTrialNumberPerGpu + - ``int`` + - Specify how many trials can share one GPU. + default: ``1`` + + * - gpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to trial processes. + If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. + This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. RemoteConfig ------------ Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__. -platform -"""""""" - -Constant string ``"remote"``. +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + * - Field Name + - Type + - Description -machineList -""""""""""" + * - platform + - ``"remote"`` + - -List of training machines. - -type: list of `RemoteMachineConfig`_ - - -reuseMode -""""""""" - -Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. - -type: ``bool`` + * - machineList + - ``List[RemoteMachineConfig]`` + - List of training machines. + * - reuseMode + - ``bool`` + - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. RemoteMachineConfig """"""""""""""""""" -host -**** - -IP or hostname (domain name) of the machine. - -type: ``str`` - - -port -**** - -SSH service port. - -type: ``int`` - -default: ``22`` - - -user -**** - -Login user name. - -type: ``str`` - - -password -******** - -Login password. - -type: ``Optional[str]`` - -If not specified, `sshKeyFile`_ will be used instead. - - -sshKeyFile -********** - -`Path`_ to sshKeyFile (identity file). - -type: ``Optional[str]`` - -Only used when `password`_ is not specified. - - -sshPassphrase -************* - -Passphrase of SSH identity file. - -type: ``Optional[str]`` - - -useActiveGpu -************ - -Specify whether NNI should submit trials to GPUs occupied by other tasks. - -type: ``bool`` - -default: ``False`` - -Must be set when `trialGpuNumber`_ greater than zero. - -Following processes can make GPU "active": - - - non-NNI CUDA programs - - graphical desktop - - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time - - other users' CUDA programs, if you are using a shared server +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - host + - ``str`` + - IP or hostname (domain name) of the machine. + + * - port + - ``int`` + - SSH service port. + default: ``22`` + + * - user + - ``str`` + - Login user name. + + * - password + - ``Optional[str]`` + - If not specified, ``sshKeyFile`` will be used instead. + + * - sshKeyFile + - ``Optional[str]`` + - `Path`_ to ``sshKeyFile`` (identity file). + Only used when `password`_ is not specified. + + * - sshPassphrase + - ``Optional[str]`` + - Passphrase of SSH identity file. + + * - useActiveGpu + - ``bool`` + - Specify whether NNI should submit trials to GPUs occupied by other tasks. + default: ``False`` + Must be set when ``trialGpuNumber`` greater than zero. + Following processes can make GPU "active": + + - non-NNI CUDA programs + - graphical desktop + - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time + - other users' CUDA programs, if you are using a shared server -If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. - -When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. - - -maxTrialNumberPerGpu -******************** - -Specify how many trials can share one GPU. - -type: ``int`` - -default: ``1`` - - -gpuIndices -********** - -Limit the GPUs visible to trial processes. - -type: ``Optional[list[int] | str | int]`` - -If `trialGpuNumber`_ is less than the length of this value, only a subset will be visible to each trial. + If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. + When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. -This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. + * - maxTrialNumberPerGpu + - ``int`` + - Specify how many trials can share one GPU. + default: ``1`` + * - gpuIndices + - ``Optional[list[int] | str | int]`` + - Limit the GPUs visible to trial processes. + If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. + This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. -pythonPath -********** + * - pythonPath + - ``Optional[str]`` + - Specify a Python environment. + This path will be inserted at the front of PATH. Here are some examples: -Specify a Python environment. + - (linux) pythonPath: ``/opt/python3.7/bin`` + - (windows) pythonPath: ``C:/Python37`` -type: ``Optional[str]`` + If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below: -This path will be inserted at the front of PATH. Here are some examples: + - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/`` + - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin`` - - (linux) pythonPath: ``/opt/python3.7/bin`` - - (windows) pythonPath: ``C:/Python37`` - -If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below: - - - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/`` - - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin`` - -This is useful if preparing steps vary for different machines. - -.. _openpai-class: + This is useful if preparing steps vary for different machines. OpenpaiConfig ------------- Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__. -platform -"""""""" - -Constant string ``"openpai"``. - - -host -"""" - -Hostname of OpenPAI service. - -type: ``str`` - -This may include ``https://`` or ``http://`` prefix. - -HTTPS will be used by default. - - -username -"""""""" - -OpenPAI user name. - -type: ``str`` - - -token -""""" - -OpenPAI user token. - -type: ``str`` - -This can be found in your OpenPAI user settings page. - - -trialCpuNumber -"""""""""""""" - -Specify the CPU number of each trial to be used in OpenPAI container. - -type: ``int`` - - -trialMemorySize -""""""""""""""" - -Specify the memory size of each trial to be used in OpenPAI container. - -type: ``str`` - -format: ``number + tb|gb|mb|kb`` - -examples: ``"8gb"``, ``"8192mb"`` - - -storageConfigName -""""""""""""""""" - -Specify the storage name used in OpenPAI. - -type: ``str`` - - -dockerImage -""""""""""" - -Name and tag of docker image to run the trials. - -type: ``str`` - -default: ``"msranni/nni:latest"`` - - -localStorageMountPoint -"""""""""""""""""""""" - -:ref:`Mount point ` of storage service (typically NFS) on the local machine. - -type: ``str`` - - -containerStorageMountPoint -"""""""""""""""""""""""""" - -Mount point of storage service (typically NFS) in docker container. - -type: ``str`` - -This must be an absolute path. - - -reuseMode -""""""""" - -Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. - -type: ``bool`` - -default: ``False`` - - -openpaiConfig -""""""""""""" - -Embedded OpenPAI config file. - -type: ``Optional[JSON]`` - - -openpaiConfigFile -""""""""""""""""" - -`Path`_ to OpenPAI config file. - -type: ``Optional[str]`` - -An example can be found `here `__. - +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - platform + - ``"openpai"`` + - + + * - host + - ``str`` + - Hostname of OpenPAI service. + This may include ``https://`` or ``http://`` prefix. + HTTPS will be used by default. + + * - username + - ``str`` + - OpenPAI user name. + + * - token + - ``str`` + - OpenPAI user token. + This can be found in your OpenPAI user settings page. + + * - trialCpuNumber + - ``int`` + - Specify the CPU number of each trial to be used in OpenPAI container. + + * - trialMemorySize + - ``str`` + - Specify the memory size of each trial to be used in OpenPAI container. + format: ``number + tb|gb|mb|kb``. + examples: ``"8gb"``, ``"8192mb"``. + + * - storageConfigName + - ``str`` + - Specify the storage name used in OpenPAI. + + * - dockerImage + - ``str`` + - Name and tag of docker image to run the trials. + default: ``"msranni/nni:latest"``. + + * - localStorageMountPoint + - ``str`` + - :ref:`Mount point ` of storage service (typically NFS) on the local machine. + + * - containerStorageMountPoint + - ``str`` + - Mount point of storage service (typically NFS) in docker container. + This must be an absolute path. + + * - reuseMode + - ``bool`` + - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. + default: ``False``. + + * - openpaiConfig + - ``Optional[JSON]`` + - Embedded OpenPAI config file. + + * - openpaiConfigFile + - ``Optional[str]`` + - `Path`_ to OpenPAI config file. + An example can be found `here `__. AmlConfig --------- Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__. +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 -platform -"""""""" - -Constant string ``"aml"``. - - -dockerImage -""""""""""" - -Name and tag of docker image to run the trials. - -type: ``str`` - -default: ``"msranni/nni:latest"`` - - -subscriptionId -"""""""""""""" - -Azure subscription ID. - -type: ``str`` - + * - Field Name + - Type + - Description -resourceGroup -""""""""""""" + * - platform + - ``"aml"`` + - + + * - dockerImage + - ``str`` + - Name and tag of docker image to run the trials. + default: ``"msranni/nni:latest"`` -Azure resource group name. + * - subscriptionId + - ``str`` + - Azure subscription ID. -type: ``str`` + * - resourceGroup + - ``str`` + - Azure resource group name. + * - workspaceName + - ``str`` + - Azure workspace name. -workspaceName -""""""""""""" - -Azure workspace name. - -type: ``str`` - - -computeTarget -""""""""""""" - -AML compute cluster name. - -type: ``str`` - + * - computeTarget + - ``str`` + - AML compute cluster name. DlcConfig --------- Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__. +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 -platform -"""""""" - -Constant string ``"dlc"``. - - -type -"""" - -Job spec type. - -type: ``str`` - -default: ``"worker"`` - - -image -""""" - -Name and tag of docker image to run the trials. - -type: ``str`` - - -jobType -""""""" - -PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``. - -type: ``str`` - - -podCount -"""""""" - -Pod count to run a single training job. - -type: ``str`` + * - Field Name + - Type + - Description + * - platform + - ``"dlc"`` + - + + * - type + - ``str`` + - Job spec type. + default: ``"worker"``. -ecsSpec -""""""" + * - image + - ``str`` + - Name and tag of docker image to run the trials. -Training server config spec string. + * - jobType + - ``str`` + - PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``. -type: ``str`` + * - podCount + - ``str`` + - Pod count to run a single training job. + * - ecsSpec + - ``str`` + - Training server config spec string. -region -"""""" + * - region + - ``str`` + - The region where PAI-DLC public-cluster locates. -The region where PAI-DLC public-cluster locates. + * - nasDataSourceId + - ``str`` + - The NAS datasource id configurated in PAI-DLC side. -type: ``str`` + * - accessKeyId + - ``str`` + - The accessKeyId of your cloud account. + * - accessKeySecret + - ``str`` + - The accessKeySecret of your cloud account. -nasDataSourceId -""""""""""""""" - -The NAS datasource id configurated in PAI-DLC side. - -type: ``str`` - - - -accessKeyId -""""""""""" - -The accessKeyId of your cloud account. - -type: ``str`` - - - -accessKeySecret -""""""""""""""" - -The accessKeySecret of your cloud account. - -type: ``str`` - - - -localStorageMountPoint -"""""""""""""""""""""" - -The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/. - -type: ``str`` - - -containerStorageMountPoint -"""""""""""""""""""""""""" - -The mount point of the NAS on PAI-DLC side, default is /root/data/. - -type: ``str`` + * - localStorageMountPoint + - ``str`` + - The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/. + * - containerStorageMountPoint + - ``str`` + - The mount point of the NAS on PAI-DLC side, default is /root/data/. HybridConfig ------------ -Currently only support `LocalConfig`_, `RemoteConfig`_, :ref:`OpenpaiConfig ` and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__. - -type: list of `TrainingServiceConfig`_ - +Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__. SharedStorageConfig ^^^^^^^^^^^^^^^^^^^ Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. - nfsConfig --------- -storageType -""""""""""" +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 -Constant string ``"NFS"``. + * - Field Name + - Type + - Description + * - storageType + - ``"NFS"`` + - -localMountPoint -""""""""""""""" + * - localMountPoint + - ``str`` + - The path that the storage has been or will be mounted in the local machine. + If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. -The path that the storage has been or will be mounted in the local machine. + * - remoteMountPoint + - ``str`` + - The path that the storage will be mounted in the remote machine. + If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. -type: ``str`` + * - localMounted + - ``str`` + - Specify the object and status to mount the shared storage. + values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` + ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. -If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. - - -remoteMountPoint -"""""""""""""""" - -The path that the storage will be mounted in the remote machine. - -type: ``str`` - -If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. - - -localMounted -"""""""""""" - -Specify the object and status to mount the shared storage. - -type: ``str`` - -values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` - -``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. - - -nfsServer -""""""""" - -NFS server host. - -type: ``str`` - - -exportedDirectory -""""""""""""""""" - -Exported directory of NFS server, detailed `here `_. - -type: ``str`` + * - nfsServer + - ``str`` + - NFS server host. + * - exportedDirectory + - ``str`` + - Exported directory of NFS server, detailed `here `_. azureBlobConfig --------------- -storageType -""""""""""" - -Constant string ``"AzureBlob"``. - - -localMountPoint -""""""""""""""" - -The path that the storage has been or will be mounted in the local machine. - -type: ``str`` - -If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. - - -remoteMountPoint -"""""""""""""""" - -The path that the storage will be mounted in the remote machine. - -type: ``str`` - -If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. - -Note that the directory must be empty when using AzureBlob. - - -localMounted -"""""""""""" - -Specify the object and status to mount the shared storage. - -type: ``str`` - -values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` - -``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. - - -storageAccountName -"""""""""""""""""" - -Azure storage account name. - -type: ``str`` - - -storageAccountKey -""""""""""""""""" - -Azure storage account key. - -type: ``Optional[str]`` - - -containerName -""""""""""""" - -AzureBlob container name. - -type: ``str`` +.. list-table:: + :widths: 10 10 80 + :header-rows: 1 + + * - Field Name + - Type + - Description + + * - storageType + - ``"AzureBlob"`` + - + + * - localMountPoint + - ``str`` + - The path that the storage has been or will be mounted in the local machine. + If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. + + * - remoteMountPoint + - ``str`` + - The path that the storage will be mounted in the remote machine. + If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. + Note that the directory must be empty when using AzureBlob. + + * - localMounted + - ``str`` + - Specify the object and status to mount the shared storage. + values: ``"usermount"``, ``"nnimount"``, ``"nomount"``. + ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. + + * - storageAccountName + - ``str`` + - Azure storage account name. + + * - storageAccountKey + - ``Optional[str]`` + - Azure storage account key. + + * - containerName + - ``str`` + - AzureBlob container name. diff --git a/docs/en_US/reference/experiment_config_copy.rst b/docs/en_US/reference/experiment_config_copy.rst deleted file mode 100644 index 33d32c0995..0000000000 --- a/docs/en_US/reference/experiment_config_copy.rst +++ /dev/null @@ -1,706 +0,0 @@ -=========================== -Experiment Config Reference -=========================== - -A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples. - -.. Note:: - - 1. This document lists field names with ``camelCase``. If users use these fields in the pythonic way with NNI Python APIs (e.g., ``nni.experiment``), the field names should be converted to ``snake_case``. - - 2. In this document, the type of fields are formatted as `Python type hint `_. Therefore JSON objects are called `dict` and arrays are called `list`. - - .. _path: - - 3. Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ``~`` will be expanded to the home directory. - - - When written in the YAML file, relative paths are relative to the directory containing that file. - - When assigned in Python code, relative paths are relative to the current working directory. - - All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file. - - 4. Setting a field to ``None`` or ``null`` is equivalent to not setting the field. - -.. contents:: Contents - :local: - :depth: 3 - - -Examples -======== - -Local Mode -^^^^^^^^^^ - -.. code-block:: yaml - - experimentName: MNIST - searchSpaceFile: search_space.json - trialCommand: python mnist.py - trialCodeDirectory: . - trialGpuNumber: 1 - trialConcurrency: 2 - maxExperimentDuration: 24h - maxTrialNumber: 100 - tuner: - name: TPE - classArgs: - optimize_mode: maximize - trainingService: - platform: local - useActiveGpu: True - -Local Mode (Inline Search Space) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. code-block:: yaml - - searchSpace: - batch_size: - _type: choice - _value: [16, 32, 64] - learning_rate: - _type: loguniform - _value: [0.0001, 0.1] - trialCommand: python mnist.py - trialGpuNumber: 1 - trialConcurrency: 2 - tuner: - name: TPE - classArgs: - optimize_mode: maximize - trainingService: - platform: local - useActiveGpu: True - -Remote Mode -^^^^^^^^^^^ - -.. code-block:: yaml - - experimentName: MNIST - searchSpaceFile: search_space.json - trialCommand: python mnist.py - trialCodeDirectory: . - trialGpuNumber: 1 - trialConcurrency: 2 - maxExperimentDuration: 24h - maxTrialNumber: 100 - tuner: - name: TPE - classArgs: - optimize_mode: maximize - trainingService: - platform: remote - machineList: - - host: 11.22.33.44 - user: alice - password: xxxxx - - host: my.domain.com - user: bob - sshKeyFile: ~/.ssh/id_rsa - -Reference -========= - -ExperimentConfig -^^^^^^^^^^^^^^^^ - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - experimentName - - ``Optional[str]`` - - Mnemonic name of the experiment, which will be shown in WebUI and nnictl. - - * - searchSpaceFile - - ``Optional[str]`` - - Path_ to the JSON file containing the search space. - Search space format is determined by tuner. The common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. - Mutually exclusive to ``searchSpace``. - - * - searchSpace - - ``Optional[JSON]`` - - Search space object. - The format is determined by tuner. Common format for built-in tuners is documented `here <../Tutorial/SearchSpaceSpec.rst>`__. - Note that ``None`` means "no such field" so empty search space should be written as ``{}``. - Mutually exclusive to ``searchSpaceFile``. - - * - trialCommand - - ``str`` - - Command to launch trial. - The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. - Note that using ``python3`` on Linux and macOS, and using ``python`` on Windows. - - * - trialCodeDirectory - - ``str`` - - `Path`_ to the directory containing trial source files. - default: ``"."``. - All files in this directory will be sent to the training machine, unless in the ``.nniignore`` file. - (See :ref:`nniignore ` for details.) - - * - trialConcurrency - - ``int`` - - Specify how many trials should be run concurrently. - The real concurrency also depends on hardware resources and may be less than this value. - - * - trialGpuNumber - - ``Optional[int]`` - - This field might have slightly different meanings for various training services, - especially when set to ``0`` or ``None``. - See `training service's document <../training_services.rst>`__ for details. - - In local mode, setting the field to ``0`` will prevent trials from accessing GPU (by empty ``CUDA_VISIBLE_DEVICES``). - And when set to ``None``, trials will be created and scheduled as if they did not use GPU, - but they can still use all GPU resources if they want. - - * - maxExperimentDuration - - ``Optional[str]`` - - Limit the duration of this experiment if specified. - format: ``number + s|m|h|d`` - examples: ``"10m"``, ``"0.5h"`` - When time runs out, the experiment will stop creating trials but continue to serve WebUI. - - * - maxTrialNumber - - ``Optional[int]`` - - Limit the number of trials to create if specified. - When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. - - * - maxTrialDuration - - ``Optional[str]`` - - Limit the duration of trial job if specified. - format: ``number + s|m|h|d`` - examples: ``"10m"``, ``"0.5h"`` - When time runs out, the current trial job will stop. - - * - nniManagerIp - - ``Optional[str]`` - - IP of the current machine, used by training machines to access NNI manager. Not used in local mode. - If not specified, IPv4 address of ``eth0`` will be used. - Except for the local mode, it is highly recommended to set this field manually. - - * - useAnnotation - - ``bool`` - - Enable `annotation <../Tutorial/AnnotationSpec.rst>`__. - default: ``False``. - When using annotation, ``searchSpace`` and ``searchSpaceFile`` should not be specified manually. - - * - debug - - ``bool`` - - Enable debug mode. - default: ``False`` - When enabled, logging will be more verbose and some internal validation will be loosened. - - * - logLevel - - ``Optional[str]`` - - Set log level of the whole system. - values: ``"trace"``, ``"debug"``, ``"info"``, ``"warning"``, ``"error"``, ``"fatal"`` - Defaults to "info" or "debug", depending on ``debug`` option. When debug mode is enabled, Loglevel is set to "debug", otherwise, Loglevel is set to "info". - Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. - The exception is trial, whose logging level is directly managed by trial code. - For Python modules, "trace" acts as logging level 0 and "fatal" acts as ``logging.CRITICAL``. - - * - experimentWorkingDirectory - - ``Optional[str]`` - - Specify the :ref:`directory ` to place log, checkpoint, metadata, and other run-time stuff. - By default uses ``~/nni-experiments``. - NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments. - - * - tunerGpuIndices - - ``Optional[list[int] | str | int]`` - - Limit the GPUs visible to tuner, assessor, and advisor. - This will be the ``CUDA_VISIBLE_DEVICES`` environment variable of tuner process. - Because tuner, assessor, and advisor run in the same process, this option will affect them all. - - * - tuner - - ``Optional[AlgorithmConfig]`` - - Specify the tuner. - The built-in tuners can be found `here <../builtin_tuner.rst>`__ and you can follow `this tutorial <../Tuner/CustomizeTuner.rst>`__ to customize a new tuner. - - * - assessor - - ``Optional[AlgorithmConfig]`` - - Specify the assessor. - The built-in assessors can be found `here <../builtin_assessor.rst>`__ and you can follow `this tutorial <../Assessor/CustomizeAssessor.rst>`__ to customize a new assessor. - - * - advisor - - ``Optional[AlgorithmConfig]`` - - Specify the advisor. - NNI provides two built-in advisors: `BOHB <../Tuner/BohbAdvisor.rst>`__ and `Hyperband <../Tuner/HyperbandAdvisor.rst>`__, and you can follow `this tutorial <../Tuner/CustomizeAdvisor.rst>`__ to customize a new advisor. - - * - trainingService - - ``TrainingServiceConfig`` - - Specify the `training service <../TrainingService/Overview.rst>`__. - - * - sharedStorage - - ``Optional[SharedStorageConfig]`` - - Configure the shared storage, detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. - -AlgorithmConfig -^^^^^^^^^^^^^^^ - -``AlgorithmConfig`` describes a tuner / assessor / advisor algorithm. - -For customized algorithms, there are two ways to describe them: - - 1. `Register the algorithm <../Tutorial/InstallCustomizedAlgos.rst>`__ to use it like built-in. (preferred) - - 2. Specify code directory and class name directly. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - name - - ``Optional[str]`` - - Name of the built-in or registered algorithm. - ``str`` for the built-in and registered algorithm, ``None`` for other customized algorithms. - - * - className - - ``Optional[str]`` - - Qualified class name of not registered customized algorithm. - ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. - example: ``"my_tuner.MyTuner"`` - - * - codeDirectory - - ``Optional[str]`` - - `Path`_ to the directory containing the customized algorithm class. - ``None`` for the built-in and registered algorithm, ``str`` for other customized algorithms. - - * - classArgs - - ``Optional[dict[str, Any]]`` - - Keyword arguments passed to algorithm class' constructor. - See algorithm's document for supported value. - -TrainingServiceConfig -^^^^^^^^^^^^^^^^^^^^^ - -One of the following: - -- `LocalConfig`_ -- `RemoteConfig`_ -- `OpenpaiConfig`_ -- `AmlConfig`_ -- `DlcConfig`_ -- `HybridConfig`_ - -For `Kubeflow <../TrainingService/KubeflowMode.rst>`_, `FrameworkController <../TrainingService/FrameworkControllerMode.rst>`_, and `AdaptDL <../TrainingService/AdaptDLMode.rst>`_ training platforms, it is suggested to use `v1 config schema <../Tutorial/ExperimentConfig.rst>`_ for now. - -LocalConfig ------------ - -Detailed usage can be found `here <../TrainingService/LocalMode.rst>`__. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - platform - - ``"local"`` - - - - * - useActiveGpu - - ``Optional[bool]`` - - Specify whether NNI should submit trials to GPUs occupied by other tasks. - Must be set when ``trialGpuNumber`` greater than zero. - Following processes can make GPU "active": - - - non-NNI CUDA programs - - graphical desktop - - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time - - other users' CUDA programs, if you are using a shared server - - If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. - When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. - - * - maxTrialNumberPerGpu - - ``int`` - - Specify how many trials can share one GPU. - default: ``1`` - - * - gpuIndices - - ``Optional[list[int] | str | int]`` - - Limit the GPUs visible to trial processes. - If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. - This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. - -RemoteConfig ------------- - -Detailed usage can be found `here <../TrainingService/RemoteMachineMode.rst>`__. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - platform - - ``"remote"`` - - - - * - machineList - - ``List[RemoteMachineConfig]`` - - List of training machines. - - * - reuseMode - - ``bool`` - - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. - -RemoteMachineConfig -""""""""""""""""""" - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - host - - ``str`` - - IP or hostname (domain name) of the machine. - - * - port - - ``int`` - - SSH service port. - default: ``22`` - - * - user - - ``str`` - - Login user name. - - * - password - - ``Optional[str]`` - - If not specified, ``sshKeyFile`` will be used instead. - - * - sshKeyFile - - ``Optional[str]`` - - `Path`_ to ``sshKeyFile`` (identity file). - Only used when `password`_ is not specified. - - * - sshPassphrase - - ``Optional[str]`` - - Passphrase of SSH identity file. - - * - useActiveGpu - - ``bool`` - - Specify whether NNI should submit trials to GPUs occupied by other tasks. - default: ``False`` - Must be set when ``trialGpuNumber`` greater than zero. - Following processes can make GPU "active": - - - non-NNI CUDA programs - - graphical desktop - - trials submitted by other NNI instances, if you have more than one NNI experiments running at same time - - other users' CUDA programs, if you are using a shared server - - If your remote machine is a graphical OS like Ubuntu desktop, set this field to ``True``, otherwise, the GUI will prevent NNI from launching any trial. - When you create multiple NNI experiments and ``useActiveGpu`` is set to ``True``, they will submit multiple trials to the same GPU(s) simultaneously. - - * - maxTrialNumberPerGpu - - ``int`` - - Specify how many trials can share one GPU. - default: ``1`` - - * - gpuIndices - - ``Optional[list[int] | str | int]`` - - Limit the GPUs visible to trial processes. - If ``trialGpuNumber`` is less than the length of this value, only a subset will be visible to each trial. - This will be used as ``CUDA_VISIBLE_DEVICES`` environment variable. - - * - pythonPath - - ``Optional[str]`` - - Specify a Python environment. - This path will be inserted at the front of PATH. Here are some examples: - - - (linux) pythonPath: ``/opt/python3.7/bin`` - - (windows) pythonPath: ``C:/Python37`` - - If you are working on Anaconda, there is some difference. On Windows, you also have to add ``../script`` and ``../Library/bin`` separated by ``;``. Examples are as below: - - - (linux anaconda) pythonPath: ``/home/yourname/anaconda3/envs/myenv/bin/`` - - (windows anaconda) pythonPath: ``C:/Users/yourname/.conda/envs/myenv;C:/Users/yourname/.conda/envs/myenv/Scripts;C:/Users/yourname/.conda/envs/myenv/Library/bin`` - - This is useful if preparing steps vary for different machines. - -OpenpaiConfig -------------- - -Detailed usage can be found `here <../TrainingService/PaiMode.rst>`__. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - platform - - ``"openpai"`` - - - - * - host - - ``str`` - - Hostname of OpenPAI service. - This may include ``https://`` or ``http://`` prefix. - HTTPS will be used by default. - - * - username - - ``str`` - - OpenPAI user name. - - * - token - - ``str`` - - OpenPAI user token. - This can be found in your OpenPAI user settings page. - - * - trialCpuNumber - - ``int`` - - Specify the CPU number of each trial to be used in OpenPAI container. - - * - trialMemorySize - - ``str`` - - Specify the memory size of each trial to be used in OpenPAI container. - format: ``number + tb|gb|mb|kb``. - examples: ``"8gb"``, ``"8192mb"``. - - * - storageConfigName - - ``str`` - - Specify the storage name used in OpenPAI. - - * - dockerImage - - ``str`` - - Name and tag of docker image to run the trials. - default: ``"msranni/nni:latest"``. - - * - localStorageMountPoint - - ``str`` - - :ref:`Mount point ` of storage service (typically NFS) on the local machine. - - * - containerStorageMountPoint - - ``str`` - - Mount point of storage service (typically NFS) in docker container. - This must be an absolute path. - - * - reuseMode - - ``bool`` - - Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__. - default: ``False``. - - * - openpaiConfig - - ``Optional[JSON]`` - - Embedded OpenPAI config file. - - * - openpaiConfigFile - - ``Optional[str]`` - - `Path`_ to OpenPAI config file. - An example can be found `here `__. - -AmlConfig ---------- - -Detailed usage can be found `here <../TrainingService/AMLMode.rst>`__. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - platform - - ``"aml"`` - - - - * - dockerImage - - ``str`` - - Name and tag of docker image to run the trials. - default: ``"msranni/nni:latest"`` - - * - subscriptionId - - ``str`` - - Azure subscription ID. - - * - resourceGroup - - ``str`` - - Azure resource group name. - - * - workspaceName - - ``str`` - - Azure workspace name. - - * - computeTarget - - ``str`` - - AML compute cluster name. - -DlcConfig ---------- - -Detailed usage can be found `here <../TrainingService/DlcMode.rst>`__. - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - platform - - ``"dlc"`` - - - - * - type - - ``str`` - - Job spec type. - default: ``"worker"``. - - * - image - - ``str`` - - Name and tag of docker image to run the trials. - - * - jobType - - ``str`` - - PAI-DLC training job type, ``"TFJob"`` or ``"PyTorchJob"``. - - * - podCount - - ``str`` - - Pod count to run a single training job. - - * - ecsSpec - - ``str`` - - Training server config spec string. - - * - region - - ``str`` - - The region where PAI-DLC public-cluster locates. - - * - nasDataSourceId - - ``str`` - - The NAS datasource id configurated in PAI-DLC side. - - * - accessKeyId - - ``str`` - - The accessKeyId of your cloud account. - - * - accessKeySecret - - ``str`` - - The accessKeySecret of your cloud account. - - * - localStorageMountPoint - - ``str`` - - The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/. - - * - containerStorageMountPoint - - ``str`` - - The mount point of the NAS on PAI-DLC side, default is /root/data/. - -HybridConfig ------------- - -Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found `here <../TrainingService/HybridMode.rst>`__. - -SharedStorageConfig -^^^^^^^^^^^^^^^^^^^ - -Detailed usage can be found `here <../Tutorial/HowToUseSharedStorage.rst>`__. - -nfsConfig ---------- - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - storageType - - ``"NFS"`` - - - - * - localMountPoint - - ``str`` - - The path that the storage has been or will be mounted in the local machine. - If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. - - * - remoteMountPoint - - ``str`` - - The path that the storage will be mounted in the remote machine. - If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. - - * - localMounted - - ``str`` - - Specify the object and status to mount the shared storage. - values: ``"usermount"``, ``"nnimount"``, ``"nomount"`` - ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. - - * - nfsServer - - ``str`` - - NFS server host. - - * - exportedDirectory - - ``str`` - - Exported directory of NFS server, detailed `here `_. - -azureBlobConfig ---------------- - -.. list-table:: - :widths: 10 10 80 - :header-rows: 1 - - * - Field Name - - Type - - Description - - * - storageType - - ``"AzureBlob"`` - - - - * - localMountPoint - - ``str`` - - The path that the storage has been or will be mounted in the local machine. - If the path does not exist, it will be created automatically. Recommended to use an absolute path, i.e. ``/tmp/nni-shared-storage``. - - * - remoteMountPoint - - ``str`` - - The path that the storage will be mounted in the remote machine. - If the path does not exist, it will be created automatically. Recommended to use a relative path. i.e. ``./nni-shared-storage``. - Note that the directory must be empty when using AzureBlob. - - * - localMounted - - ``str`` - - Specify the object and status to mount the shared storage. - values: ``"usermount"``, ``"nnimount"``, ``"nomount"``. - ``usermount`` means the user has already mounted this storage on localMountPoint. ``nnimount`` means NNI will try to mount this storage on localMountPoint. ``nomount`` means storage will not mount in the local machine, will support partial storages in the future. - - * - storageAccountName - - ``str`` - - Azure storage account name. - - * - storageAccountKey - - ``Optional[str]`` - - Azure storage account key. - - * - containerName - - ``str`` - - AzureBlob container name. From f5f052976f513456b508fe2fd60da33561dcf228 Mon Sep 17 00:00:00 2001 From: J-shang Date: Wed, 1 Sep 2021 09:04:37 +0800 Subject: [PATCH 4/4] fix lint --- docs/en_US/reference/experiment_config.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en_US/reference/experiment_config.rst b/docs/en_US/reference/experiment_config.rst index 33d32c0995..af20dde9e1 100644 --- a/docs/en_US/reference/experiment_config.rst +++ b/docs/en_US/reference/experiment_config.rst @@ -391,7 +391,7 @@ RemoteMachineConfig * - sshKeyFile - ``Optional[str]`` - `Path`_ to ``sshKeyFile`` (identity file). - Only used when `password`_ is not specified. + Only used when ``password`` is not specified. * - sshPassphrase - ``Optional[str]``