This dataset is built on top of the VoxCeleb1 dataset. We provide facial attribute annotations, segmentation masks, and artistic drawings for each video.
- Video: Videos are cropped with video-preocessing.
- Label: We provide manually annotated facial attributes for each cropped video. The annotation is provided here.
- Text: As described here, the textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. Code to generate texts is provided in here.
- Segmentation: We run face-parsing to generate segmentation masks for each cropped video. The script is provided here.
- Drawing: We run Unpaired Portrait Drawing to generate artistic drawings for each cropped video. The script to generate drawings is provided here.
All processed files are available in this link. Separate zip files are available: videos, masks, drawings, texts, labels, annotations (json).
The processed data needs to be organized in the following way:
│MMVID/data/mmvoxceleb/
├──video/
│ ├── id11248#yDqlBD8m_b8#00004.txt#000.mp4/
│ │ ├── 0000000.png
│ │ ├── 0000001.png
│ │ ├── ......
│ ├── id11248#yiNkInm9OKQ#00001.txt#000.mp4/
│ ├── ......
├──txt/
│ ├── id11248#yDqlBD8m_b8#00004.txt#000.mp4.txt
│ ├── id11248#yiNkInm9OKQ#00001.txt#000.mp4.txt
│ ├── ......
├──label/
│ ├── id11248#yDqlBD8m_b8#00004.txt#000.mp4.txt
│ ├── id11248#yiNkInm9OKQ#00001.txt#000.mp4.txt
│ ├── ......
├──mask/
│ ├── id11248#yDqlBD8m_b8#00004.txt#000.mp4/
│ │ ├── 0000000.png
│ │ ├── 0000001.png
│ │ ├── ......
│ ├── id11248#yiNkInm9OKQ#00001.txt#000.mp4/
│ ├── ......
├──draw/
│ ├── style1/
│ │ ├── id11248#yDqlBD8m_b8#00004.txt#000.mp4/
| │ ├── id11248#yiNkInm9OKQ#00001.txt#000.mp4/
The first time you run the dataloader, it will create a cache file (data/mmvoxceleb_local.pkl
). We also provide a pre-generated cache file here.