Skip to content

Commit

Permalink
Job configuration doc
Browse files Browse the repository at this point in the history
  • Loading branch information
omry committed May 5, 2020
1 parent 762c5ee commit 1e0d044
Show file tree
Hide file tree
Showing 4 changed files with 151 additions and 7 deletions.
7 changes: 4 additions & 3 deletions hydra/conf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,20 @@ class OverridesConf:
# job runtime information will be populated here
@dataclass
class JobConf:
# Job name, can be specified by the user (in config or cli) or populated automatically
# Job name, populated automatically unless specified by the user (in config or cli)
name: str = MISSING

# Populated automatically by Hydra.
# Concatenation of job overrides that can be used as a part
# of the directory name.
# This can be configured in hydra.job.config.override_dirname
# This can be configured via hydra.job.config.override_dirname
override_dirname: str = MISSING

# Job ID in underlying scheduling system
id: str = MISSING

# Job number if job is a part of a sweep
num: str = MISSING
num: int = MISSING

# The config name used by the job
config_name: Optional[str] = MISSING
Expand Down
138 changes: 138 additions & 0 deletions website/docs/configure_hydra/job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
id: job
title: Job configuration
---

The job configuration resides in `hydra.job`.
The structure definition is below, you can fine the latest definition [in the code](https://github.com/facebookresearch/hydra/blob/master/hydra/conf/__init__.py).
## Definition
```python
# job runtime information will be populated here
@dataclass
class JobConf:
# Job name, populated automatically unless specified by the user (in config or cli)
name: str = MISSING

# Concatenation of job overrides that can be used as a part
# of the directory name.
# This can be configured in hydra.job.config.override_dirname
override_dirname: str = MISSING

# Job ID in underlying scheduling system
id: str = MISSING

# Job number if job is a part of a sweep
num: int = MISSING

# The config name used by the job
config_name: Optional[str] = MISSING

# Environment variables to set remotely
env_set: Dict[str, str] = field(default_factory=dict)
# Environment variables to copy from the launching machine
env_copy: List[str] = field(default_factory=list)

# Job config
@dataclass
class JobConfig:
@dataclass
# configuration for the ${hydra.job.override_dirname} runtime variable
class OverrideDirname:
kv_sep: str = "="
item_sep: str = ","
exclude_keys: List[str] = field(default_factory=list)

override_dirname: OverrideDirname = OverrideDirname()

config: JobConfig = JobConfig()
```

## Documentation
### hydra.job.name
The job name is used by different things in Hydra, such as the log file name (`${hydra.job.name}.log`).
It is automatically set with Python file name (file: `train.py` -> name: `train`), but you can override
it you specify it via the command line or your config file.

### hydra.job.override_dirname
This field is populated automatically using your command line arguments and is typically being used as a part of your
output directory pattern.
For example, the command line arguments:
```bash
$ python foo.py a=10 b=20
```
Would result in `hydra.job.override_dirname` getting the value a=10,b=20.
When used with the output directory override, it can automatically generate directories that represent the
command line arguments used in your run.
```yaml
hydra:
run:
dir: output/${hydra.job.override_dirname}
```
The generation of override_dirname can be controlled by `hydra.job.config.override_dirname`.
In particular, the separator char `=` and the item separator char `,` can be modified, and in addition some command line
override keys can be automatically excluded from the generated `override_dirname`.
An example of a case where the exclude is useful is a random seed.

```yaml
hydra:
run:
dir: output/${hydra.job.override_dirname}/seed=${seed}
job:
config:
override_dirname:
exclude_keys:
- seed
```
With this configuration, running
```bash
$ python foo.py a=10 b=20 seed=999
```

Would result in a directory like:
```
output/a=10,b=20/seed=999
```
Allowing you to more easily group identical runs with different random seeds together.

### hydra.job.id
The job ID is populated by active Hydra launcher. For the basic launcher, the job ID is just a serial job number, but
for other systems this could be the SLURM job ID or the AWS Instance ID.

### hydra.job.num
Serial job number within this current sweep run. (0 to n-1)

### hydra.job.config_name
The config name used by the job, this is populated automatically to match the config name in @hydra.main()

### hydra.job.env_set
A Dict[str, str] that is used to set the environment variables of the running job.
Some common use cases are to set environment variables that are effecting underlying libraries, for example
```yaml
hydra:
job:
env_set:
OMP_NUM_THREADS: 1
```
Disables multithreading in Intel IPP and MKL.

Another example, is to use interpolation to automatically set the rank
for [Torch Distributed](https://pytorch.org/tutorials/intermediate/dist_tuto.html) run to match the job number
in the sweep.

```yaml
hydra:
job:
env_set:
RANK: ${hydra.job.num}
```

### hydra.job.env_copy
In some cases you want to automatically copy local environment variables to the running job environment variables.
This is particularly useful for remote runs.
```yaml
hydra:
job:
env_copy:
- AWS_KEY
```
12 changes: 8 additions & 4 deletions website/docs/configure_hydra/workdir.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,13 @@ Outputs can also be configured through the CLI, like any other configuration.

>python train.py model.nb_layers=3 hydra.run.dir=3_layers

This feature can become really powerful to write multiruns without boilerplate using substitution.

> python train.py --multirun model.nb_layers=1,2,3,5 hydra.sweep.dir=multiruns/layers_effect hydra.sweep.subdir=\${model.nb_layers}
This feature can become really powerful to write multiruns without boilerplate using interpolation.

```
python train.py --multirun \
model.nb_layers=1,2,3,5 \
hydra.sweep.dir=multiruns/layers_effect \
hydra.sweep.subdir=\${model.nb_layers}
```
With bash, be careful to escape the $ symbol. Otherwise, bash will try to resolve the substitution, instead of passing it to Hydra.
With bash, be careful to escape the $ symbol. Otherwise, bash will try to resolve the interpolation, instead of passing it to Hydra.
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ module.exports = {

'Configuring Hydra': [
'configure_hydra/intro',
'configure_hydra/job',
'configure_hydra/logging',
'configure_hydra/workdir',
'configure_hydra/app_help',
Expand Down

0 comments on commit 1e0d044

Please sign in to comment.