-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydra help and documentation update for file-based execution configuration #31
Changes from 3 commits
9a9d4f8
02e0bac
119476d
4f1a6cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,65 +48,93 @@ autointent data.train_path=default-multiclass \ | |
seed=42 | ||
``` | ||
|
||
Все опции (по группам): | ||
Все опции в виде yaml (показаны дефолтные значения): | ||
```yaml | ||
data: | ||
# Path to a json file with training data. Set to "default" to use banking77 data stored within the | ||
# autointent package. | ||
train_path: ??? | ||
|
||
# Path to a json file with test records. Skip this option if you want to use a random subset of the | ||
# training sample as test data. | ||
test_path: null | ||
|
||
# Set to true if your data is multiclass but you want to train the multilabel classifier. | ||
force_multilabel: false | ||
|
||
task: | ||
# Path to a yaml configuration file that defines the optimization search space. | ||
# Omit this to use the default configuration. | ||
search_space_path: null | ||
logs: | ||
# Name of the run prepended to optimization assets dirname (generated randomly if omitted) | ||
run_name: "awful_hippo_10-30-2024_19-42-12" | ||
|
||
# Location where to save optimization logs that will be saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`. | ||
# Omit to use current working directory. <-- on Windows it is not correct | ||
dirpath: "/home/user/AutoIntent/awful_hippo_10-30-2024_19-42-12" | ||
|
||
dump_dir: "/home/user/AutoIntent/runs/awful_hippo_10-30-2024_19-42-12/modules_dumps" | ||
|
||
vector_index: | ||
# Location where to save faiss database file. Omit to use your system's default cache directory. | ||
db_dir: null | ||
|
||
# Specify device in torch notation | ||
device: cpu | ||
|
||
augmentation: | ||
# Number of shots per intent to sample from regular expressions. This option extends sample utterance | ||
# within multiclass intent records. | ||
regex_sampling: 0 | ||
|
||
# Config string like "[20, 40, 20, 10]" means 20 one-label examples, 40 two-label examples, 20 three-label examples, | ||
# 10 four-label examples. This option extends multilabel utterance records. | ||
multilabel_generation_config: null | ||
|
||
embedder: | ||
# batch size for embedding computation. | ||
batch_size: 1 | ||
# sentence length limit for embedding computation | ||
max_length: null | ||
|
||
#Affects the randomness | ||
seed: 0 | ||
|
||
# String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. Omit to use ERROR by default. | ||
hydra.job_logging.root.level: "ERROR" | ||
``` | ||
seed Affects the randomness | ||
|
||
== task == | ||
|
||
search_space_path Path to a yaml configuration file that defines the | ||
optimization search space. Omit this to use the | ||
default configuration. | ||
|
||
== data == | ||
|
||
train_path Path to a json file with training data. Set to | ||
"default" to use banking77 data stored within the | ||
autointent package. | ||
|
||
test_path Path to a json file with test records. Skip this | ||
option if you want to use a random subset of the | ||
training sample as test data. | ||
|
||
force_multilabel Set to true if your data is multiclass but you want to | ||
train the multilabel classifier. | ||
|
||
== logs == | ||
|
||
dirpath Location where to save optimization logs that will be | ||
saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`. | ||
Omit to use current working directory. | ||
|
||
run_name Name of the run prepended to optimization assets dirname | ||
|
||
log_level String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. | ||
Omit to use ERROR by default. | ||
|
||
== vector_index == | ||
|
||
db_dir Location where to save faiss database file. Omit to | ||
use your system's default cache directory. | ||
|
||
device Specify device in torch notation | ||
|
||
== augmentation == | ||
|
||
regex_sampling Number of shots per intent to sample from regular | ||
expressions. This option extends sample utterances | ||
within multiclass intent records. | ||
### Как задавать конфигурационные опции | ||
* Вариант 1 - в коммандной строке в виде key=value. Пример: | ||
```bash | ||
autointent embedder.batch_size=32 | ||
``` | ||
|
||
seed Affects the data partitioning | ||
* Вариант 2 - в конфигурационном yaml файле. | ||
Создайте в отдельной папке yaml файл со следующей структурой (желательно чтобы имя файла отличалось от | ||
optimization_config.yaml, чтобы избежать warnings от hydra) **my_config.yaml**: | ||
```yaml | ||
defaults: | ||
- optimization_config | ||
- _self_ | ||
- override hydra/job_logging: custom | ||
|
||
# put the configuration options you want to override here. The full structure is presented above. | ||
# Here is just an example with the same options as for the command line variant above. | ||
embedder: | ||
embedder_batch_size: 32 | ||
``` | ||
Запускаем AutoIntent: | ||
```bash | ||
autointent --config-path=/home/user/config --config-name=my_config | ||
``` | ||
!!ВАЖНО!! | ||
* указывайте полный путь в опции config-path. | ||
* не используйте tab в yaml файле. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. тут ведь имеется в виду There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Да, иначе hydra начинает искать конфиг как часть пакета (покрайней мере на Windows, на других платформах не пробовал) |
||
|
||
hydra.job_logging.root.level | ||
String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. | ||
Omit to use ERROR by default. | ||
Вы можете использовать комбинацию Варианта 1 и 2. Опции из коммандной строки имеют наивысший приоритет. | ||
|
||
multilabel_generation_config | ||
Config string like "[20, 40, 20, 10]" means 20 one- | ||
label examples, 40 two-label examples, 20 three-label | ||
examples, 10 four-label examples. This option extends | ||
multilabel utterance records. | ||
``` | ||
|
||
Вместе с пакетом предоставляются дефолтные конфиг и данные (5-shot banking77 / 20-shot dstc3). | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
defaults: | ||
- optimization_config | ||
- _self_ | ||
|
||
data: | ||
train_path: "default-multilabel" | ||
|
||
hydra: | ||
job_logging: | ||
root: | ||
level: "INFO" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
defaults: | ||
- optimization_config | ||
- _self_ | ||
|
||
data: | ||
train_path: "data/intent_records/ac_robotic_new.json" | ||
force_multilabel: true | ||
|
||
logs: | ||
dirpath: "experiments/multiclass_as_multilabel/" | ||
run_name: "robotics_new_testing" | ||
|
||
augmentation: | ||
regex_sampling: 10 | ||
multilabel_generation_config: "[0, 4000, 1000]" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
defaults: | ||
- optimization_config | ||
- _self_ | ||
|
||
data: | ||
train_path: "data/intent_records/ac_robotic_new.json" | ||
test_path: "data/intent_records/ac_robotic_val.json" | ||
force_multilabel: true | ||
|
||
augmentation: | ||
regex_sampling: 20 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
defaults: | ||
- optimization_config | ||
- _self_ | ||
|
||
data: | ||
train_path: "default-multiclass" | ||
test_path: "data/intent_records/banking77_test.json" | ||
|
||
seed: 42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
надо бы не забыть со следующим пр изменить в README дефолтные значения, указанные в разделе
Все опции
(пишу для себя)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я бы еще добавил в ридми строчку в духе "ПРимеры конфигурационных файлов можно посмотреть в этой папке"