Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Task and Tune API #118

Merged
merged 20 commits into from
Dec 23, 2022
Merged

[Refactor] Task and Tune API #118

merged 20 commits into from
Dec 23, 2022

Conversation

KKIEEK
Copy link
Contributor

@KKIEEK KKIEEK commented Dec 20, 2022

Motivation

To build runnable task from cfg

TODO

  • Enhance work_dir (experiment name is determined by work_dir)
  • Refactor log_analysis
  • Add test code for Tuner class
  • Pass test code
  • Final test

Modification

Please briefly describe what modification is made in this PR.

Expected behavior

work_dirs
  |- mmdet_asynchb_nevergrad_pso (config name)
    |- 221221_113600 (experiment name)
      |- best_trial
        |- work_dirs (log of best trial)
        |- best_trial.log
      |- DataParallelTrainer_738f915b_1_data_samples_per_gpu=5,model=yolo_x_8x8,optimizer=adam ... (trial_name)
        |- rank_0 (log dir for task process)
          |- work_dirs
            |- faster_rcnn_r50_fpn_1x_coco
              |- faster_rcnn_r50_fpn_1x_coco.py
              |- latest.pth
        |- rank_1
        |- params.json
        |- params.pkl 
        |- progress.csv
        |- result.json
      |- DataParallelTrainer_7caa5acf_2_data_samples_per_gpu=6,model=faster_rcnn_swin_s_p4_w7_fpn,optimizer=adamw ...
        |- (trial directory name is determined by trial_name_creator and trial_dirname_creator)
      |- some.json
      |- some.pkl

Result

work_dirs/mmdet_asynchb_nevergrad_pso/20221223_070902
├── 1_data_samples_per_gpu=7,model=yolo_x_8x8,optimizer=adamw
│   ├── error.pkl
│   ├── error.txt
│   ├── events.out.tfevents.1671779346.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 2005c6e6
│   │           ├── 2005c6e6.py
│   │           ├── 20221223_070919.log
│   │           └── 20221223_070919.log.json
│   └── result.json
├── 2_data_samples_per_gpu=3,model=tood_r101_dcnv2,optimizer=adamw
│   ├── checkpoint_-00001
│   ├── events.out.tfevents.1671779388.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── progress.csv
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 60d57f42
│   │           ├── 20221223_071001.log
│   │           ├── 20221223_071001.log.json
│   │           └── 60d57f42.py
│   └── result.json
├── 3_data_samples_per_gpu=6,model=yolo_x_8x8,optimizer=rms
│   ├── error.pkl
│   ├── error.txt
│   ├── events.out.tfevents.1671779444.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 777781b3
│   │           ├── 20221223_071057.log
│   │           ├── 20221223_071057.log.json
│   │           └── 777781b3.py
│   └── result.json
├── 4_data_samples_per_gpu=3,model=tood_r101_dcnv2,optimizer=sgd
│   ├── checkpoint_-00001
│   ├── events.out.tfevents.1671779486.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── progress.csv
│   ├── rank_0
│   │   └── work_dirs
│   │       └── b42854ef
│   │           ├── 20221223_071139.log
│   │           ├── 20221223_071139.log.json
│   │           └── b42854ef.py
│   └── result.json
├── 5_data_samples_per_gpu=2,model=faster_rcnn_x101_64x4d_fpn,optimizer=rms
│   ├── checkpoint_-00001
│   ├── events.out.tfevents.1671779532.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── progress.csv
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 463aa231
│   │           ├── 20221223_071225.log
│   │           ├── 20221223_071225.log.json
│   │           └── 463aa231.py
│   └── result.json
├── 6_data_samples_per_gpu=3,model=faster_rcnn_swin_s_p4_w7_fpn,optimizer=sgd
│   ├── checkpoint_-00001
│   ├── events.out.tfevents.1671779574.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── progress.csv
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 037ff5be
│   │           ├── 037ff5be.py
│   │           ├── 20221223_071308.log
│   │           └── 20221223_071308.log.json
│   └── result.json
├── 7_data_samples_per_gpu=7,model=yolo_x_8x8,optimizer=adam
│   ├── error.pkl
│   ├── error.txt
│   ├── events.out.tfevents.1671779618.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 7b5d2242
│   │           ├── 20221223_071351.log
│   │           ├── 20221223_071351.log.json
│   │           └── 7b5d2242.py
│   └── result.json
├── 8_data_samples_per_gpu=5,model=faster_rcnn_x101_64x4d_fpn,optimizer=adam
│   ├── checkpoint_-00001
│   ├── events.out.tfevents.1671779660.test-hpo-6667d579dc-nvhjg
│   ├── params.json
│   ├── params.pkl
│   ├── progress.csv
│   ├── rank_0
│   │   └── work_dirs
│   │       └── 61b2f6a0
│   │           ├── 20221223_071433.log
│   │           ├── 20221223_071433.log.json
│   │           └── 61b2f6a0.py
│   └── result.json
├── best_trial
│   ├── best_trial.log
│   └── log
│       ├── checkpoint_-00001
│       ├── events.out.tfevents.1671779388.test-hpo-6667d579dc-nvhjg
│       ├── params.json
│       ├── params.pkl
│       ├── progress.csv
│       ├── rank_0
│       │   └── work_dirs
│       │       └── 60d57f42
│       │           ├── 20221223_071001.log
│       │           ├── 20221223_071001.log.json
│       │           └── 60d57f42.py
│       └── result.json
├── experiment_state-2022-12-23_07-09-06.json
├── mmdet_asynchb_nevergrad_pso.py
├── search_gen_state-2022-12-23_07-09-06.json
├── searcher-state-2022-12-23_07-09-06.pkl
├── trainable.pkl
└── tuner.pkl

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@KKIEEK KKIEEK requested review from nijkah and yhna940 December 20, 2022 13:07
@nijkah nijkah changed the title Refactor Task Refactor Task and tune API Dec 21, 2022
Copy link
Contributor

@yhna940 yhna940 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOD 💡

@KKIEEK KKIEEK changed the title Refactor Task and tune API [Refactor] Task and tune API Dec 21, 2022
@KKIEEK KKIEEK requested review from yhna940 and nijkah December 21, 2022 14:45
@KKIEEK KKIEEK changed the title [Refactor] Task and tune API [Refactor] Task and Tune API Dec 22, 2022
@KKIEEK
Copy link
Contributor Author

KKIEEK commented Dec 22, 2022

Suggestion

  • To support a relative path, follow works must be preceded.
# at tune/config.py
class _CustomTorchBackend(_TorchBackend):
    ...

    def on_start(self, worker_group: WorkerGroup,
        ...

        def set_env_vars(addr, port, rank, world_size):
            # fix here
            os.environ['TUNE_ORIG_WORKING_DIR'] = os.getcwd()

            os.environ['MASTER_ADDR'] = addr
            os.environ['MASTER_PORT'] = str(port)
            os.environ['RANK'] = str(rank)
            os.environ['LOCAL_RANK'] = str(rank)
            os.environ['WORLD_SIZE'] = str(world_size)
# at core/rewriters/rel2abs.py
import os

from .base import BaseRewriter
from .builder import REWRITERS

@REWRITERS.register_module()
class Rel2Abs(BaseRewriter):

    def __call__(self, context: dict) -> dict:
        os.chdir(os.environ['TUNE_ORIG_WORKING_DIR'])  # or some task
        return context

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@9475dc2). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #118   +/-   ##
=======================================
  Coverage        ?   73.09%           
=======================================
  Files           ?       58           
  Lines           ?     1617           
  Branches        ?      238           
=======================================
  Hits            ?     1182           
  Misses          ?      334           
  Partials        ?      101           
Flag Coverage Δ
unittests 73.09% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@KKIEEK KKIEEK merged commit 36ca780 into main Dec 23, 2022
@KKIEEK KKIEEK deleted the refactor/tasks branch December 23, 2022 09:05
@yhna940 yhna940 mentioned this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants