HigherHRNet architecture implemented and trained from scratch using ImageNet and COCO datasets. The model is trained in two steps:
- Classification Backbone pretraining on ImageNet dataset (ClassificationHRNet model)
- Human Pose model Training on COCO dataset (HigherHRNet model)
π Table of Contents
The environment management is handled with the use of poetry
. To install the virtual environment:
- Clone the repository
git clone https://github.com/thawro/pytorch-human-pose.git
- Move to the repository (
<project_root>
)
cd pytorch-human-pose
-
Install
poetry
- follow documentation -
Install the virtual environment and activate it (the script runs
poetry install
andpoetry shell
)
make env
- Create directories for training/inference purposes
make dirs
NOTE: If you have installed the environment already (with
make env
orpoetry install
) you can activate it withpoetry shell
.
NOTE: The data preparation scripts use
tqdm
to show progress bars for file unzipping, so make sure to install and activate the Environment first.
- Download dataset from kagle
- Go to link
- Sign in to kaggle
- Scroll down and click "Download All" button
- Move downloaded
imagenet-object-localization-challenge.zip
file to<project_root>/data
directory - Run ImageNet preparation script from the
<project_root>
directory (it may take a while)
make imagenet
The script will unzip the downloaded imagenet-object-localization-challenge.zip
file, remove it, create the ImageNet
directory and move unzipped files from ILSVRC/Data/CLS-LOC
directory to ImageNet
directory. Then it will move the val image files to separate directories (named by wodnet labels) using this script and it will download the json mapping for ImageNet labels (from this source)
After these steps there should be a directory data/ImageNet
with the following directory structure:
data/ImageNet
βββ wordnet_labels.yaml
βββ train
| βββ n01440764
| βββ n01443537
| ...
| βββ n15075141
βββ val
βββ n01440764
βββ n01443537
...
βββ n15075141
- Run COCO preparation scripts from the
<project_root>
directory (it may take a while)
make coco
make save_coco_annots
make coco
script will create data/COCO
directory, download files from the COCO website (2017 Train images [118K/18GB], 2017 Val images [5K/1GB], 2017 Test images [41K/6GB], 2017 Train/Val annotations [241MB]) to the data/COCO
directory, unzip the files, move the files to images
and annotations
subdirectories and remove the redundant zip files. make save_coco_annots
will parse COCO annotation .json
files and save the per-sample annotations to .yaml
files and per-sample crowd-masks (used in loss function) to .npy
files.
After these steps there should be a directory data/COCO
with the following directory structure:
data/COCO
βββ annotations
β βββ captions_train2017.json
β βββ captions_val2017.json
β βββ instances_train2017.json
β βββ instances_val2017.json
β βββ person_keypoints_train2017.json
β βββ person_keypoints_val2017.json
βββ images
βββ test2017
βββ train2017
βββ val2017
Install the poetry virtual environment following Environment steps.
The checkpoints are available at Google Drive:
hrnet_32.pt
- backbone pretrained on the ImageNethigher_hrnet_32.pt
- pose estimation model trained on COCO
After download, place the checkpoints inside the pretrained directory.
NOTE: Checkpoints must be present in pretrained directory to perform the inference.
NOTE: You must first install and activate the Environment to perform the inference.
Inference using the ClassificationHRNet model trained on ImageNet dataset (1000 classes). The parameters configurable via CLI:
--inference.input_size
- smaller edge of the image will be matched to this number (default: 256)--inference.ckpt_path
- checkpoint path (default: pretrained/hrnet_32.pt)
NOTE: ImageNet data must be prepared to perform inference on it.
Run inference on ImageNet val split with default input_size (256)
python src/classification/bin/inference.py --mode "val"
with changed input size
python src/classification/bin/inference.py --mode "val" --inference.input_size=512
python src/classification/bin/inference.py --mode "custom" --dirpath "data/examples/classification"
Inference using the HigherHRNet model trained on COCO keypoints dataset (17 keypoints). The parameters configurable via CLI:
--inference.input_size
- smaller edge of the image will be matched to this number (default: 256)--inference.ckpt_path
- checkpoint path (default: pretrained/higher_hrnet_32.pt)--inference.det_thr
- detection threshold used in grouping (default: 0.05)--inference.tag_thr
- associative embedding tags threshold used in grouping (default: 0.5)--inference.use_flip
- whether to use horizontal flip and average the results (default: False)
NOTE: COCO data must be prepared to perform inference on it.
Run inference on COCO val split with default inference parameters
python src/keypoints/bin/inference.py --mode "val"
with changed input_size, use_flip and det_thr
python src/keypoints/bin/inference.py --mode "val" --inference.input_size=256 --inference.use_flip=True --inference.det_thr=0.1
python src/keypoints/bin/inference.py --mode "custom" --path "data/examples/keypoints/"
python src/keypoints/bin/inference.py --mode "custom" --path "data/examples/keypoints/simple_3.mp4"
Each sample is composed of Connections plot, Associative Embeddings visualization (after grouping) and Heatmaps plot
Each sample with two input_sizes variants
- Two people (size: 256)
two_256.mp4
π More examples
- Two people (size: 512)
two_512.mp4
- Three people (size: 256)
three_256.mp4
- Three people (size: 512)
three_512.mp4
NOTE: You must first install and activate the Environment to perform the training.
IMPORTANT: MLFlow logging is enabled by default, so before every training one must run
make mlflow_server
to start the server.
Most important training configurable CLI parameters (others can be checked in config python files):
setup.ckpt_path
- Path to checkpoint file saved during training (for training resume)setup.pretrained_ckpt_path
- Path to checkpoint file with pretrained network weightstrainer.accelerator
- Device for training (cpu
orgpu
)trainer.limit_batches
- How many batches are used for training. Parameter used to run a debug experiment. When limit_batches > 0, then experiment is considered as debugtrainer.use_DDP
- Whether to run Distributed Data Parallel (DDP) training on multiple GPUstrainer.sync_batchnorm
- Whether to use SyncBatchnorm class for DDP training
NOTE: ImageNet data must be prepared to train the backbone model.
python src/classification/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False
--setup.ckpt_path=None
to ensure that new experiment is created, --trainer.use_DDP=False
to ensure that single GPU is used
Using multiple GPUs - use torchrun
torchrun --standalone --nproc_per_node=2 src/classification/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=True
TODO
NOTE: COCO data must be prepared to train the human pose model.
NOTE:
src/keypoints/datasets/coco/CocoKeypointsDataset
during initialization runs its method_save_annots_to_files
(only if it wasn't already executed for particular split), which parses the COCO .json annotation files and saves per-sample.yaml
annotation files and.npy
crowd masks (used in loss function) todata/COCO/annotations/person_keypoints_<split>/<sample_id>.yaml
anddata/COCO/masks/person_keypoints_<split>/<sample_id>.npy
. It is executed only if annotations directory (data/COCO/annotations/person_keypoints_<split>
) isn't present.
python src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.pretrained_ckpt_path="pretrained/hrnet_32.pt"
--setup.ckpt_path=None
to ensure that new experiment is created, --trainer.use_DDP=False
to ensure that single GPU is used, --setup.pretrained_ckpt_path
to load pretrained backbone model from hrnet_32.pt
file
Using multiple GPUs - use torchrun
torchrun --standalone --nproc_per_node=2 src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=True --setup.pretrained_ckpt_path="pretrained/hrnet_32.pt"
NOTE: Before running evaluation script you must ensure that correct
run_path
is defined inside the script.run_path
must point to the directory where training checkpoint (.pt
file) and config (.yaml
) files are present.
python src/keypoints/bin/eval.py
After running this script there will be a evaluation_results
directory created ( inside the run_path
directory) with the evaluation output files:
coco_output.txt
- file with txt output from pycocotools (the table)config.yaml
- config of the evaluated runval2017_results.json
- json file with results (predicted keypoints coordinates)
Evaluation results obtained for inference parameters:
--inference.input_size=512
--inference.use_flip=True
Metric name | Area | Max Dets | Metric value |
---|---|---|---|
Average Precision (AP) @IoU=0.50:0.95 | all | 20 | 0.673 |
Average Precision (AP) @IoU=0.50 | all | 20 | 0.870 |
Average Precision (AP) @IoU=0.75 | all | 20 | 0.733 |
Average Precision (AP) @IoU=0.50:0.95 | medium | 20 | 0.615 |
Average Precision (AP) @IoU=0.50:0.95 | large | 20 | 0.761 |
Average Recall (AR) @IoU=0.50:0.95 | all | 20 | 0.722 |
Average Recall (AR) @IoU=0.50 | all | 20 | 0.896 |
Average Recall (AR) @IoU=0.75 | all | 20 | 0.770 |
Average Recall (AR) @IoU=0.50:0.95 | medium | 20 | 0.652 |
Average Recall (AR) @IoU=0.50:0.95 | large | 20 | 0.819 |
.
βββ data # datasets files
β βββ COCO # COCO dataset
β βββ examples # example inputs for inference
β βββ ImageNet # ImageNet dataset
|
βββ experiments # experiments configs - files needed to perform training/inference
β βββ classification # configs for ClassificationHRNet
β βββ keypoints # configs for HigherHRNet
|
βββ inference_out # directory with output from inference
β βββ classification # classification inference output
β βββ keypoints # keypoints inference output
|
βββ Makefile # Makefile for cleaner scripts using
|
βββ mlflow # mlflow files
β βββ artifacts # artifacts saved during training
β βββ mlruns.db # database for mlflow metrics saved during training
β βββ test_experiment.py # script for some mlflow server testing
|
βββ poetry.lock # file updated during poetry environment management
|
βββ pretrained # directory with trained checkpoints
β βββ higher_hrnet_32.pt # HigherHRNet checkpoint - COCO human pose model
β βββ hrnet_32.pt # ClassificationHRNet checkpoint - ImageNet classification model
|
βββ pyproject.toml # definition of poetry environment
|
βββ README.md # project README
|
βββ RESEARCH.md # my sidenotes for human pose estimation task
|
βββ results # directory with training results/logs
β βββ classification # classification experiment results
β βββ debug # debug experiments results
β βββ keypoints # keypoints experiment results
|
βββ scripts # directory with useful scripts
β βββ prepare_coco.sh # prepares COCO dataset - can be used without any other actions
β βββ prepare_dirs.sh # creates needed directories
β βββ prepare_env.sh # installs and activates poetry environment
β βββ prepare_imagenet.sh # prepares ImageNet dataset - requires ImageNet zip file to be downloaded before running
β βββ run_mlflow.sh # runs mlflow server (locally)
|
βββ src # project modules
βββ base # base module - defines interfaces, abstract classes and useful training loops
βββ classification # classification related files subclasses
βββ keypoints # keypoints related files subclasses
βββ logger # logging functionalities (monitoring and training loggers)
βββ utils # utilities functions (files loading, images manipulation, configs parsing, etc.)
Training and inference is parametrized using configs. Configs are defined in experiments
directory using the .yaml
files.
.yaml
files parsing is done with dataclasses tailored for this purpose.classification
and keypoints
configs share some custom
implementations which are defined in src/base/config.py
. Task specific configs are implemented in src/classification/config.py
and src/keypoints/config.py
.
The Config dataclasses allow to overwrite the config parameters loaded from .yaml
files by putting additional arguments to script calls using the following notation:
--<field_name>.<nested_field_name>=<new_value>
, for example:
python src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.pretrained_ckpt_path=None
overwrites the setup.ckpt_path
, trainer.use_DDP
and setup.pretrained_ckpt_path
attributes.
The Config dataclasses are also repsonsible for creation of training and inference related objects with the use of the following methods:
create_net
(task-specific) - create neural network object (torch.nn.Module
)create_datamodule
(task-specific) - create datamodule (object used for loading train/val/test data into batches)create_module
(task-specific) - create training module (object used to handle training and validation steps)create_inference_model
(task-specific) - create model tailored for inference purposescreate_callbacks
- create callbacks (objects used during the training, each with special hooks)create_logger
- create logger (object used for logging purposes)create_trainer
- create trainer (object used to manage the whole training pipeline)
IMPORTANT: You must ensure that environment is active (
poetry shell
) and mlflow server is running (make mlflow_server
) before training.
During training the results
directory is being populated with useful info about runs (logs, metrics, evaluation examples, etc.).
The structure of the populated results
directory is the following:
results
βββ <experiment_name> # run experiment_name (e.g. classification)
βββ <run_name> # run run_name (e.g. 03-21_11:05__ImageNet_ClassificationHRNet)
βββ <timestamp_1> # run timestamp (e.g. 03-21_11:05)
| βββ checkpoints # saved checkpoints
| βββ config.yaml # config used for current run
| βββ data_examples # examples of data produced by datasets defined in datamodule
| βββ epoch_metrics.html # plots with metrics returned by module class (html)
| βββ epoch_metrics.jpg # plots with metrics returned by module class (jpg)
| βββ epoch_metrics.yaml # yaml with metrics
| βββ eval_examples # example evaluation results (plots produced by results classes)
| βββ logs # per-device logs and system monitoring metrics
| βββ model # model-related files (ONNX if saved, layers summary, etc.)
βββ <timestamp_2> # resumed run timestamp (e.g. 03-22_12:10)
βββ checkpoints
...
βββ model
Each training run is parametrized by yaml config. The names shown in <>
are defined by:
setup.experiment_name
define the<experiment_name>
directory name,sertup.run_name
define the<run_name>
directory name. If set tonull
(default), then<run_name>
is generated automatically as<timestamp>__<setup.dataset>_<setup.architecture>
For each new run there is a new results directory created (defined by current timestamp). If run is resumed (same <run_name>
is used), then the new subrun directory (based on timestamp) is added.
By default the mlflow is used as the experiments logger (local mlflow
server under http://127.0.0.1:5000/
address). The runs logged in mlflow are structured a bit different than ones present in results
directory. The main differences:
- Resuming the run is equivalent to logging to the same run (no subruns directories added),
- There is a new directory in a run artifacts called
history
, where logs and configs of each subrun are saved in their corresponding<timestamp>
directories, - Resuming the run overwrites previously logged
data_examples
,logs
,config.yaml
,eval_examples
andepoch_metrics
artifacts.- [Multi Person Pose Estimation with PyTorch]
NOTE: Read all previous chapters before running the commands listed below.
NOTE:
Adjust settings like:
--dataloader.batch_size
(default: 80 for hrnet, 36 for higher_hrnet)--dataloader.num_workers
(default: 4 for both tasks)to your device capabilities
Depending on what and how you would like to train the models there exist a few possibilities (listed below). All examples assume using single GPU (to train with multiple GPUs use the torchrun
commands from previous chapters)
First run:
python src/classification/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.experiment_name="classification_exp" --setup.run_name="only_hrnet_run"
Optionally if resuming is needed:
use checkpoint from previous run:
ckpt_path = "results/classification_exp/only_hrnet_run/<timestamp>/checkpoints/last.pt"
python src/classification/bin/train.py --setup.ckpt_path=<ckpt_path> --trainer.use_DDP=False
First run:
python src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.experiment_name="keypoints_exp" --setup.run_name="only_higherhrnet_run"
Optionally if resuming is needed:
use checkpoint from previous run:
ckpt_path = "results/keypoints_exp/only_higherhrnet_run/<timestamp>/checkpoints/last.pt"
python src/keypoints/bin/train.py --setup.ckpt_path=<ckpt_path> --trainer.use_DDP=False --setup.experiment_name="keypoints_exp" --setup.run_name="only_higherhrnet_run"
NOTE: Downloaded
hrnet_32.pt
checkpoint must be present in pretrained directory.
First run:
python src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.experiment_name="keypoints_exp" --setup.run_name="pretrained_higherhrnet_run" --setup.pretrained_ckpt_path="pretrained/hrnet_32.pt"
Optionally if resuming is needed:
NOTE: There is no need to pass the
pretrained_ckpt_path
when resuming the training since its weights were updated during training.
use checkpoint from previous run:
ckpt_path = "results/keypoints_exp/pretrained_higherhrnet_run/<timestamp>/checkpoints/last.pt"
python src/keypoints/bin/train.py --setup.ckpt_path=<ckpt_path> --trainer.use_DDP=False --setup.experiment_name="keypoints_exp" --setup.run_name="only_higherhrnet_run"
The complete ("from scratch") training include pretraining of ClassificationHRNet and then using it as a backbone for HRNet.
- Train classification model (HRNet backbone)
python src/classification/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.experiment_name="classification_exp" --setup.run_name="from_scratch_hrnet_pretrain_run"
Optionally if resuming is needed:
ckpt_path = "results/classification_exp/from_scratch_hrnet_pretrain_run/<timestamp>/checkpoints/last.pt"
python src/classification/bin/train.py --setup.ckpt_path=<ckpt_path> --trainer.use_DDP=False --setup.experiment_name="classification_exp" --setup.run_name="from_scratch_hrnet_pretrain_run"
- Use pretrained backbone and train HigherHRNet keypoints estimation model
pretrained_ckpt_path = "results/classification_exp/from_scratch_hrnet_pretrain_run/<timestamp>/checkpoints/last.pt"
python src/keypoints/bin/train.py --setup.ckpt_path=None --trainer.use_DDP=False --setup.experiment_name="keypoints_exp" --setup.run_name="from_scratch_pretrained_higherhrnet_run" --setup.pretrained_ckpt_path=<pretrained_ckpt_path>
Optionally if resuming is needed:
ckpt_path = "results/keypoints_exp/from_scratch_pretrained_higherhrnet_run/<timestamp>/checkpoints/last.pt"
python src/classification/bin/train.py --setup.ckpt_path=<ckpt_path> --trainer.use_DDP=False --setup.experiment_name="classification_exp" --setup.run_name="from_scratch_pretrained_higherhrnet_run"