This repo holds codes of the paper: Continuous Sign Language Recognition with Correlation Network. (CVPR 2023) [paper]
This repo is based on VAC (ICCV 2021). Many thanks for their great work!
(Update on 2024/04/17) We release CorrNet+, an unified model with superior performance on both continuous sign language recognition and sign language translation tasks by using only RGB inputs.
-
This project is implemented in Pytorch (better >=1.13 to be compatible with ctcdecode or these may exist errors). Thus please install Pytorch first.
-
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode. (ctcdecode is only supported on the Linux platform.)
-
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
mkdir ./software
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
You may use the python version evaluation tool for convenience (by setting 'evaluate_tool' as 'python' in line 16 of ./configs/baseline.yaml), but sclite can provide more detailed statistics.
-
You can install other required modules by conducting
pip install -r requirements.txt
The implementation for the CorrNet (line 18) is given in ./modules/resnet.py.
It's then equipped with the BasicBlock in ResNet in line 58 ./modules/resnet.py.
We later found that the Identification Module with only spatial decomposition could perform on par with what we report in the paper (spatial-temporal decomposition) and is slighter faster, and thus implement it as such.
You can choose any one of following datasets to verify the effectiveness of CorrNet.
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
-
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess.py --process-image --multiprocessing
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T
-
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess-T.py --process-image --multiprocessing
If you get an error like IndexError: list index out of range
on the PHOENIX2014-T dataset, you may refer to this issue to tackle the problem.
-
Request the CSL Dataset from this website [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL
-
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess-CSL.py --process-image --multiprocessing
-
Request the CSL-Daily Dataset from this website [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily
-
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 18.8% | 19.4% | [Baidu] (passwd: skd3) [Google Drive] |
We wrongly delete the original checkpoint and retrain the model with similar accuracy (Dev: 18.9%, Test: 19.7%)
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 18.9% | 20.5% | [Baidu] (passwd: deuq) [Google Drive] |
To evaluate upon CSL-Daily with this checkpoint, you should remove the CorrNet block after layer2, i.e., comment line 102 and 145 in resnet.py and change the num from 3 to 2 in line 105, change self.alpha[1] & self.alpha[2] to self.alpha[0] & self.alpha[1] in line 147 & 149, respectively.
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 30.6% | 30.1% | [Baidu] (passwd: u2iv) [Google Drive] |
To evaluate the pretrained model, choose the dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml first, and run the command below:
python main.py --config ./config/baseline.yaml --device your_device --load-weights path_to_weight.pt --phase test
The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:
python main.py --config ./config/baseline.yaml --device your_device
Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.
For CSL-Daily dataset, You may choose to reduce the lr by half from 0.0001 to 0.00005, change the lr deacying rate (gamma in the 'optimizer.py') from 0.2 to 0.5, and disable the temporal resampling strategy (comment line 121 in dataloader_video.py).
For Grad-CAM visualization, you can replace the resnet.py under "./modules" with the resnet.py under "./weight_map_generation", and then run python generate_cam.py
with your own hyperparameters.
If you find this repo useful in your research works, please consider citing:
@inproceedings{hu2023continuous,
title={Continuous Sign Language Recognition with Correlation Network},
author={Hu, Lianyu and Gao, Liqing and Liu, Zekang and Feng, Wei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023},
}