This is a multi-person 2D pose estimation network based on the EfficientHRNet approach (that follows the Associative Embedding framework). For every person in an image, the network detects a human pose: a body skeleton consisting of keypoints and connections between them. The pose may contain up to 17 keypoints: ears, eyes, nose, shoulders, elbows, wrists, hips, knees, and ankles.
Metric | Value |
---|---|
Average Precision (AP) | 54.3% |
GFlops | 14.3253 |
MParams | 8.1506 |
Source framework | PyTorch* |
Average Precision metric described in COCO Keypoint Evaluation site.
Image, name: image
, shape: 1, 3, 448, 448
in the B, C, H, W
format, where:
B
- batch sizeC
- number of channelsH
- image heightW
- image width
Expected color order is BGR
.
The net outputs are two blobs:
heatmaps
of shape1, 17, 224, 224
containing location heatmaps for keypoints of all types. Locations that are filtered out by non-maximum suppression algorithm have negated values assigned to them.embeddings
of shape1, 17, 224, 224, 1
containing associative embedding values, which are used for grouping individual keypoints into poses.
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
[*] Other names and brands may be claimed as the property of others.