This is the imlementation of submission (TarHeels) for the Ego4D: Object State Change Classification Challenge at 1st Ego4D Workshop, CVPR 2022. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge.
You can download the technical report of our submission from here.
- Follow the instruction from timeSformer for setup and installation.
- Run
create_fho_clips.py
for processing and creating video clips. - Run
create_fho_dataset.py
for creating the dataset. - Use following command to run the train the model.
python tools/run_net.py \
--cfg configs/Ego4dFho/TimeSformer_divST_8x32_224.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
- Finally, run
generate_submission.py
to generate submission file for the challenge.