This repository contains pipelines to conduct video QA with deep learning based models. It supports image loading, feature extraction, feature caching, training framework, tensorboard logging and more.
We use python3 (3.5.2), and python2 is not supported. We use PyTorch (1.1.0), though tensorflow-gpu is necessary to launch tensorboard.
python packages: fire for commandline api
data/
AnotherMissOh/
AnotherMissOh_images/
$IMAGE_FOLDERS
AnotherMissOh_QA/
AnotherMissOhQA_train_set.json
AnotherMissOhQA_val_set.json
AnotherMissOhQA_test_set.json
$QA_FILES
AnotherMissOh_subtitles.json
git clone --recurse-submodules (this repo)
cd $REPO_NAME/code
(use python >= 3.5)
pip install -r requirements.txt
python -m nltk.downloader 'punkt'
Place the data folder at data
.
cd code
python cli.py train
Access the prompted tensorboard port to view basic statistics.
At the end of every epoch, a checkpoint file will be saved on /data/ckpt/OPTION_NAMES
-
Use
video_type
config option to use'shot'
or'scene'
type data. -
if you want to run the code with less memory requirements, use the following flags.
python cli.py train --extractor_batch_size=$BATCH --num_workers=$NUM_WORKERS
- You can use
use_inputs
config option to change the set of inputs to use. The default value is['images', 'subtitle']
. It is forbidden to usedescription
input for the challenge.
For further configurations, take a look at startup/config.py
and
fire.
cd code
python cli.py evaluate --ckpt_name=$CKPT_NAME
Substitute CKPT_NAME to your prefered checkpoint file.
e.g. --ckpt_name=='feature*/loss_1.34'
python cli.py infer --model_name=$MODEL_NAME --ckpt_name=$CKPT_NAME
The above command will save the outcome at the prompted location.
cd code/scripts
python eval_submission.py -y $SUBMISSION_PATH -g $DATA_PATH
- images are resized to 224X224 for preprocessing (resnet input size)
- using last layer of resnet50 for feature extraction (base behaviour)
- using glove.6B.300d for pretrained word embedding
- storing image feature cache after feature extraction (for faster dataloading)
- using nltk.word_tokenize for tokenization
- all images for a scene questions are concatenated in a temporal order
See the Troubleshooting page and submit a new issue or contact us if you cannot find an answer.
To contact us, send an email to jiwanchung@vision.snu.ac.kr