Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HPO] Update configurations for MPA #1168

Merged
merged 21 commits into from
Jul 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7873812
updated configurations for MPA
yunchu Jul 15, 2022
269b0a6
updated mpa task for hpo
yunchu Jul 18, 2022
2d83dbb
Add EpochRunnerWithCancel to mpa_tasks for Classification Runner
harimkang Jul 18, 2022
53f0e1b
set priority of OTEProgressHook as 71 to put behind eval hook
eunwoosh Jul 18, 2022
d2d1ce8
fixed score report bug for hpo
yunchu Jul 18, 2022
ed2f7b2
Merge branch 'modify-mpa-templates-for-hpo' of https://github.com/ope…
yunchu Jul 18, 2022
aa029e4
updated configurations for MPA
yunchu Jul 15, 2022
5e48834
updated mpa task for hpo
yunchu Jul 18, 2022
aecb824
Add EpochRunnerWithCancel to mpa_tasks for Classification Runner
harimkang Jul 18, 2022
6ec1e7e
set priority of OTEProgressHook as 71 to put behind eval hook
eunwoosh Jul 18, 2022
e3f258f
fix _get_best_model_weight_path function aligning to mpa output direc…
eunwoosh Jul 18, 2022
a34cc79
bugfix & refactoring
eunwoosh Jul 18, 2022
6aa1a0c
implement fixed initial weight on MPA & classification score reportin…
eunwoosh Jul 19, 2022
6af3a80
code arrangement
eunwoosh Jul 19, 2022
21a7ab7
Merge branch 'modify-mpa-templates-for-hpo' of https://github.com/ope…
yunchu Jul 20, 2022
5dd1c47
restore TrainingProgressCallback impl. to omit score scaling
yunchu Jul 20, 2022
e79eb08
[OTE] Comment out time.sleep(2) in EpochRunnerWithCancel (#1174)
JihwanEom Jul 20, 2022
469e0d4
Merge branch 'modify-mpa-templates-for-hpo' into develop
eunwoosh Jul 20, 2022
d37c321
converting to configdict in MPA instead of in hpo.py
eunwoosh Jul 20, 2022
5d5bd48
remove ModelSavedCallback class
eunwoosh Jul 20, 2022
f3e3a1c
workaround to error of mmcls eval
eunwoosh Jul 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion external/deep-object-reid/torchreid_tasks/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,10 @@ def on_epoch_end(self, epoch, logs=None):
print(f'score = {score} at epoch {self.current_epoch} / {self._num_iters}')
# as a trick, score (at least if it's accuracy not the loss) and iteration number
# could be assembled just using summation and then disassembeled.
score = score + int(self._num_iters)
if 1.0 > score:
score = score + int(self._num_iters)
else:
score = -(score + int(self._num_iters))
self.update_progress_callback(self.get_progress(), score=score)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,10 @@ def on_epoch_end(self, epoch, logs=None):
print(f'score = {score} at epoch {epoch} / {int(iter_num)}')
# as a trick, score (at least if it's accuracy not the loss) and iteration number
# could be assembled just using summation and then disassembeled.
score = score + int(iter_num)
if 1.0 > score:
score = score + int(iter_num)
else:
score = -(score + int(iter_num))
self.update_progress_callback(self.get_progress(), score=score)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ def train(self, data_loader: DataLoader, **kwargs):
self.data_loader = data_loader
self._max_iters = self._max_epochs * len(self.data_loader)
self.call_hook('before_train_epoch')
time.sleep(2) # Prevent possible deadlock during epoch transition
# TODO: uncomment below line or resolve root cause of deadlock issue if multi-GPUs need to be supported.
# time.sleep(2) # Prevent possible multi-gpu deadlock during epoch transition
for i, data_batch in enumerate(self.data_loader):
self._inner_iter = i
self.call_hook('before_train_iter')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,10 @@ def on_epoch_end(self, epoch, logs=None):
print(f'score = {score} at epoch {epoch} / {int(iter_num)}')
# as a trick, score (at least if it's accuracy not the loss) and iteration number
# could be assembled just using summation and then disassembeled.
score = score + int(iter_num)
if 1.0 > score:
score = score + int(iter_num)
else:
score = -(score + int(iter_num))
self.update_progress_callback(self.get_progress(), score=score)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def train(self, data_loader: DataLoader, **kwargs):
self.data_loader = data_loader
self._max_iters = self._max_epochs * len(self.data_loader)
self.call_hook('before_train_epoch')
time.sleep(2) # Prevent possible deadlock during epoch transition
# TODO: uncomment below line or resolve root cause of deadlock issue if multi-GPUs need to be supported.
# time.sleep(2) # Prevent possible multi-gpu deadlock during epoch transition
for i, data_batch in enumerate(self.data_loader):
self._inner_iter = i
self.call_hook('before_train_iter')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ learning_parameters:
warning:
Increasing this value may cause the system to use more memory than available,
potentially causing out of memory errors, please update with caution.
auto_hpo_state: NOT_POSSIBLE
description: Learning Parameters
header: Learning Parameters
learning_rate:
Expand All @@ -42,6 +43,7 @@ learning_parameters:
type: UI_RULES
visible_in_ui: true
warning: null
auto_hpo_state: NOT_POSSIBLE
max_num_epochs:
affects_outcome_of: TRAINING
default_value: 200
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
metric: mAP
search_algorithm: smbo
early_stop: median_stop
metric: accuracy_top-1
search_algorithm: asha
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.001
- 0.01
- 0.001
- 0.0003
- 0.1
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 8
- 64
- 32
- 128
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 32
auto_hpo_state: POSSIBLE
num_workers:
default_value: 4
learning_rate:
default_value: 0.007
auto_hpo_state: POSSIBLE
num_iters:
default_value: 20
algo_backend:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
metric: mAP
search_algorithm: smbo
early_stop: median_stop
metric: accuracy_top-1
search_algorithm: asha
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.001
- 0.01
- 0.001
- 0.0014
- 0.035
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 8
- 64
- 20
- 48
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 32
auto_hpo_state: POSSIBLE
num_workers:
default_value: 4
learning_rate:
default_value: 0.007
auto_hpo_state: POSSIBLE
num_iters:
default_value: 20
algo_backend:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
metric: mAP
search_algorithm: smbo
early_stop: median_stop
metric: accuracy_top-1
search_algorithm: asha
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.005
- 0.029
- 0.001
- 0.0032
- 0.08
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 8
- 64
- 20
- 48
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 32
auto_hpo_state: POSSIBLE
num_workers:
default_value: 4
learning_rate:
default_value: 0.016
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 100
num_iters:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
metric: mAP
search_algorithm: smbo
early_stop: median_stop
metric: accuracy_top-1
search_algorithm: asha
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.005
- 0.029
- 0.001
- 0.0032
- 0.08
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 8
- 64
- 20
- 48
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 32
auto_hpo_state: POSSIBLE
num_workers:
default_value: 4
learning_rate:
default_value: 0.016
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 100
num_iters:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
metric: mAP
search_algorithm: smbo
early_stop: median_stop
metric: accuracy_top-1
search_algorithm: asha
early_stop: None
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.005
- 0.029
- 0.001
- 0.0032
- 0.08
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 8
- 64
- 20
- 48
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 32
auto_hpo_state: POSSIBLE
num_workers:
default_value: 4
learning_rate:
default_value: 0.016
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 100
num_iters:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ learning_parameters:
warning:
Increasing this value may cause the system to use more memory than available,
potentially causing out of memory errors, please update with caution.
auto_hpo_state: NOT_POSSIBLE
description: Learning Parameters
header: Learning Parameters
learning_rate:
Expand All @@ -44,6 +45,7 @@ learning_parameters:
value: 0.01
visible_in_ui: true
warning: null
auto_hpo_state: NOT_POSSIBLE
learning_rate_warmup_iters:
affects_outcome_of: TRAINING
default_value: 100
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
metric: mAP
search_algorithm: smbo
search_algorithm: asha
early_stop: None
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.001
- 0.1
- 0.001
- 0.00002
- 0.0005
- 0.00001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 4
- 8
- 2
- 16
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 4
auto_hpo_state: POSSIBLE
learning_rate:
default_value: 0.0001
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 10
num_iters:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
metric: mAP
search_algorithm: smbo
search_algorithm: asha
early_stop: None
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.001
- 0.1
- 0.001
- 0.0001
- 0.01
- 0.0001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 4
- 8
- 2
- 16
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 4
auto_hpo_state: POSSIBLE
learning_rate:
default_value: 0.001
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 10
num_iters:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
metric: mAP
search_algorithm: smbo
search_algorithm: asha
early_stop: None
hp_space:
learning_parameters.learning_rate:
param_type: quniform
param_type: qloguniform
range:
- 0.001
- 0.1
- 0.0005
- 0.05
- 0.001
learning_parameters.batch_size:
param_type: qloguniform
range:
- 4
- 8
- 2
- 16
- 2
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ hyper_parameters:
learning_parameters:
batch_size:
default_value: 4
auto_hpo_state: POSSIBLE
learning_rate:
default_value: 0.005
auto_hpo_state: POSSIBLE
learning_rate_warmup_iters:
default_value: 10
num_iters:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ learning_parameters:
warning:
Increasing this value may cause the system to use more memory than available,
potentially causing out of memory errors, please update with caution.
auto_hpo_state: NOT_POSSIBLE
description: Learning Parameters
header: Learning Parameters
learning_rate:
Expand All @@ -44,6 +45,7 @@ learning_parameters:
value: 0.01
visible_in_ui: true
warning: null
auto_hpo_state: NOT_POSSIBLE
learning_rate_warmup_iters:
affects_outcome_of: TRAINING
default_value: 100
Expand Down
Loading