Skip to content

Commit

Permalink
Hydra help and documentation update for file-based execution configur…
Browse files Browse the repository at this point in the history
…ation (#31)
  • Loading branch information
Dmitryv-2024 authored Nov 5, 2024
1 parent 5765e10 commit c7b5c4e
Show file tree
Hide file tree
Showing 6 changed files with 159 additions and 56 deletions.
139 changes: 86 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,69 +49,102 @@ autointent data.train_path=default-multiclass \
seed=42
```

Все опции (по группам):
Все опции в виде yaml (показаны дефолтные значения):
```yaml
data:
# Path to a json file with training data. Set to "default" to use banking77 data stored within the
# autointent package.
train_path: ???

# Path to a json file with test records. Skip this option if you want to use a random subset of the
# training sample as test data.
test_path: null

# Set to true if your data is multiclass but you want to train the multilabel classifier.
force_multilabel: false

task:
# Path to a yaml configuration file that defines the optimization search space.
# Omit this to use the default configuration.
search_space_path: null
logs:
# Name of the run prepended to optimization assets dirname (generated randomly if omitted)
run_name: "awful_hippo_10-30-2024_19-42-12"

# Location where to save optimization logs that will be saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
# Omit to use current working directory. <-- on Windows it is not correct
dirpath: "/home/user/AutoIntent/awful_hippo_10-30-2024_19-42-12"

dump_dir: "/home/user/AutoIntent/runs/awful_hippo_10-30-2024_19-42-12/modules_dumps"

vector_index:
# Location where to save faiss database file. Omit to use your system's default cache directory.
db_dir: null

# Specify device in torch notation
device: cpu

augmentation:
# Number of shots per intent to sample from regular expressions. This option extends sample utterance
# within multiclass intent records.
regex_sampling: 0

# Config string like "[20, 40, 20, 10]" means 20 one-label examples, 40 two-label examples, 20 three-label examples,
# 10 four-label examples. This option extends multilabel utterance records.
multilabel_generation_config: null

embedder:
# batch size for embedding computation.
batch_size: 1
# sentence length limit for embedding computation
max_length: null

#Affects the randomness
seed: 0

# String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. Omit to use ERROR by default.
hydra.job_logging.root.level: "ERROR"
```
seed Affects the randomness
== task ==
search_space_path Path to a yaml configuration file that defines the
optimization search space. Omit this to use the
default configuration.
== data ==
train_path Path to a json file with training data. Set to
"default" to use banking77 data stored within the
autointent package.
test_path Path to a json file with test records. Skip this
option if you want to use a random subset of the
training sample as test data.
force_multilabel Set to true if your data is multiclass but you want to
train the multilabel classifier.
== logs ==
dirpath Location where to save optimization logs that will be
saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
Omit to use current working directory.
run_name Name of the run prepended to optimization assets dirname
log_level String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
Omit to use ERROR by default.
== vector_index ==
db_dir Location where to save faiss database file. Omit to
use your system's default cache directory.
device Specify device in torch notation
### Как задавать конфигурационные опции
* Вариант 1 - в коммандной строке в виде key=value. Пример:
```bash
autointent embedder.batch_size=32
```

== augmentation ==
* Вариант 2 - в конфигурационном yaml файле.
Создайте в отдельной папке yaml файл со следующей структурой **my_config.yaml**:
```yaml
defaults:
- optimization_config
- _self_
- override hydra/job_logging: custom

# put the configuration options you want to override here. The full structure is presented above.
# Here is just an example with the same options as for the command line variant above.
embedder:
embedder_batch_size: 32
```
Запускаем AutoIntent:
```bash
autointent --config-path=/path/to/config/directory --config-name=my_config
```

regex_sampling Number of shots per intent to sample from regular
expressions. This option extends sample utterances
within multiclass intent records.
Важно:
* указывайте полный путь в опции config-path.
* не используйте tab в yaml файле.
* желательно чтобы имя файла отличалось от
optimization_config.yaml, чтобы избежать warnings от hydra

seed Affects the data partitioning
Вы можете использовать комбинацию Варианта 1 и 2. Опции из коммандной строки имеют наивысший приоритет.

hydra.job_logging.root.level
String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
Omit to use ERROR by default.

multilabel_generation_config
Config string like "[20, 40, 20, 10]" means 20 one-
label examples, 40 two-label examples, 20 three-label
examples, 10 four-label examples. This option extends
multilabel utterance records.
```

Вместе с пакетом предоставляются дефолтные конфиг и данные (5-shot banking77 / 20-shot dstc3).

Пример входных данных в директории `data/intent_records`.
Примеры:
- примеры входных данных: [data](./data)
- примеры конфигов: [example_configs](./example_configs)

### Инференс

Expand Down
30 changes: 27 additions & 3 deletions autointent/configs/optimization_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING

from autointent.custom_types import LogLevel
from autointent.pipeline.optimization.utils import generate_name


Expand All @@ -28,7 +27,6 @@ class TaskConfig:
class LoggingConfig:
run_name: str | None = None
dirpath: Path | None = None
level: LogLevel = LogLevel.ERROR
dump_dir: Path | None = None

def __post_init__(self) -> None:
Expand Down Expand Up @@ -84,7 +82,11 @@ class OptimizationConfig:
embedder: EmbedderConfig = field(default_factory=EmbedderConfig)

defaults: list[Any] = field(
default_factory=lambda: ["_self_", {"override hydra/job_logging": "autointent_standard_job_logger"}]
default_factory=lambda: [
"_self_",
{"override hydra/job_logging": "autointent_standard_job_logger"},
{"override hydra/help": "autointent_help"},
]
)


Expand All @@ -107,7 +109,29 @@ class OptimizationConfig:
"disable_existing_loggers": "false",
}

help_config = {
"app_name": "AutoIntent",
"header": "== ${hydra.help.app_name} ==",
"footer": """
Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help""",
"template": """
${hydra.help.header}
This is ${hydra.help.app_name}!
== Config ==
This is the config generated for this run.
You can override everything, for example:
python my_app.py db.user=foo db.pass=bar
-------
$CONFIG
-------
${hydra.help.footer}""",
}


cs = ConfigStore.instance()
cs.store(name="optimization_config", node=OptimizationConfig)
cs.store(name="autointent_standard_job_logger", group="hydra/job_logging", node=logger_config)
cs.store(name="autointent_help", group="hydra/help", node=help_config)
11 changes: 11 additions & 0 deletions example_configs/example_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- optimization_config
- _self_

data:
train_path: "default-multilabel"

hydra:
job_logging:
root:
level: "INFO"
15 changes: 15 additions & 0 deletions example_configs/example_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
defaults:
- optimization_config
- _self_

data:
train_path: "data/intent_records/ac_robotic_new.json"
force_multilabel: true

logs:
dirpath: "experiments/multiclass_as_multilabel/"
run_name: "robotics_new_testing"

augmentation:
regex_sampling: 10
multilabel_generation_config: "[0, 4000, 1000]"
11 changes: 11 additions & 0 deletions example_configs/example_3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- optimization_config
- _self_

data:
train_path: "data/intent_records/ac_robotic_new.json"
test_path: "data/intent_records/ac_robotic_val.json"
force_multilabel: true

augmentation:
regex_sampling: 20
9 changes: 9 additions & 0 deletions example_configs/example_4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
defaults:
- optimization_config
- _self_

data:
train_path: "default-multiclass"
test_path: "data/intent_records/banking77_test.json"

seed: 42

0 comments on commit c7b5c4e

Please sign in to comment.