This is an official GitHub Repository for paper "Refining Action Segmentation with Hierarchical Video Representations", which is accepted as a regular paper (poster) in ICCV 2021.
- Python >= 3.7
- pytorch => 1.0
- torchvision
- numpy
- pyYAML
- Pillow
- pandas
- Conda or VirtualEnv is recommended. To set the environment, run:
pip install -r requirements.txt
- Download the dataset from the SSTDA repository, Dataset Link Here
- Unzip the zip file, and re-name the './Datasets/action-segmentation' folder as "./dataset"
- Clone git repositories for this repo and several backbone models
git clone https://github.com/cotton-ahn/HASR_iccv2021
cd ./HASR_iccv2021
mkdir backbones
cd ./backbones
git clone https://github.com/yabufarha/ms-tcn
git clone https://github.com/cmhungsteve/SSTDA
git clone https://github.com/yiskw713/asrf
- Run the script for ASRF
cd ..
./scripts/install_asrf.sh
- Modify the script of MSTCN
- In ./backbones/ms-tcn/model.py, delete 104th line, which is "print vid"
- In ./backbones/ms-tcn/batch_gen.py, change 49th line to "length_of_sequences=list(map(len, batch_target))"
- use (BACKBONE NAME)_train_evaluate.ipynb to train backbones first.
- use REFINER_train_evaluate.ipynb to train the proposed refiner HASR.
- When training refiner, specify dataset, split, backbone names to use in training (pool_backbone_name), backbone name to use in testing (main_backbone_name)
dataset = 'gtea' # choose from gtea, 50salads, breakfast
split = 2 # gtea : 1~4, 50salads : 1~5, breakfast : 1~4
pool_backbone_name = ['mstcn'] # 'asrf', 'mstcn', 'sstda', 'mgru'
main_backbone_name = 'mstcn'
- Use show_quantitative_results.ipynb to see the saved records in "./records"
- Note that evaluation results can be a bit different from the ones from our paper since the video representation encoder works in a sampling-based way.
We release the pretrained backbone models that we have used for our experiments Link
Download the "model.zip" folder, and unzip it as "model" in this workspace "HASR_iccv2021"
After you successfully prepare for training, the whole folder structure would be as follows (record, result):
HASR_iccv2021
└── configs
└── record
│ └── asrf
│ └── mstcn
│ └── sstda
│ └── mgru
└── csv
│ └── gtea
│ └── 50salads
│ └── breakfast
└── dataset
│ └── gtea
│ └── 50salads
│ └── breakfast
└── scripts
└── src
└── model
│ └── asrf
│ └── mstcn
│ └── sstda
│ └── mgru
└── backbones
│ └── asrf
│ └── ms-tcn
│ └── SSTDA
└── ASRF_train_evaluate.ipynb
└── MSTCN_train_evaluate.ipynb
└── SSTDA_train_evaluate.ipynb
└── mGRU_train_evaluate.ipynb
└── REFINER_train_evaluate.ipynb
└── show_quantitative_results.ipynb
└── LICENSE
└── README.md
└── requirements.txt
- In supplementary material, we mentioned that the experiment results of applying HASR to (UNSEEN) SSTDA/ASRF with Breakfast dataset will be uploaded on this Github Page. Here is the relevant information.
F1@10 | F1@25 | F1@50 | Edit | Acc | |
---|---|---|---|---|---|
SSTDA | 70.9 | 64.7 | 50.3 | 70.2 | 67.8 |
SSTDA+HASR | 74.6 | 68.5 | 53.9 | 71.0 | 68.7 |
Gain | 3.7 | 3.8 | 3.6 | 0.9 | 0.9 |
F1@10 | F1@25 | F1@50 | Edit | Acc | |
---|---|---|---|---|---|
ASRF | 73.8 | 68.6 | 56.4 | 72.2 | 68.5 |
ASRF+HASR | 74.8 | 70.0 | 57.0 | 70.6 | 70.3 |
Gain | 1.0 | 1.4 | 0.6 | -1.6 | 1.8 |
- In table 1, F1@{0, 25, 50} should be changed to F1@{10, 25, 50}.
We hugely appreciate for previous researchers in this field. Especially MS-TCN, SSTDA, ASRF, made a huge contribution for future researchers like us!