Skip to content

lik1996/mcan-bert

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCAN Model with BERT for OK-VQA


This work utilizes bert to encode the question features instead of the LSTM in MCAN model. Evaluating the method on OK-VQA dataset. Besides we use visual genome dataset to expand the OKVQA train samples. Similar to existing strategies, we preprocessed the samples by two rules:

 1. Select the QA pairs with the corresponding images appear in the OK-VQA train splits

 2. Select the QA pairs with the answer appear in the processed answer list (occurs more than 2 times in whole OK-VQA answers).

Prerequisites


 1. We use devices include: Cuda 10.1 and Cudnn

 2. First you can create a new envs by conda:

    $ conda create -n vqa python==3.6.5
    $ source activate vqa

 3. Install pytorch and torchvision(you can reference the Pyotrch Install):

    $ pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

 4. Install SpaCy and initialize the GloVe as follows

    $ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
    $ pip install en_vectors_web_lg-2.1.0.tar.gz

 5. Install Transformers

    $ pip install transformers==2.11.0 

Dataset Download


a. Image Download

The image features are extracted using the bottom-up-attention strategy, with each image being represented as an dynamic number (from 10 to 100) of 2048-D features. You can download from this Bottom-up-Attention . Downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz; The image of the OK-VQA dataset is a subset of the VQA 2.0 datset.

b. Question Download

The question-answer pair you can download in this OKVQA website. Downloaded files contains: Testing annotations, Testing annotations.

We expand the question sample of training data, so the new training data we provide in the path: /datasets/okvqa/, which contrains: OpenEnded_mscoco_train2014_questions_add_VG.json, mscoco_train2014_annotations_add_VG.json

Training


The following script will start training with the default hyperparameters:

$ pytho run.py --RUN=train --GPU=0 --MODEL=large

All checkpoint files will be saved to:

ckpts/ckpt_<VERSION>/epoch<EPOCH_NUMBER>.pkl

and the training log file will be placed at:

results/log/log_run_<VERSION>.txt

You also can set the "EVAL_EVERY_EPOCH=True" in file "cfgs/base_cfgs.py", which can eval in every epoch.

Val and Test


The following script will start valing with the default hyperparameters:

$ python run.py RUN=val --GPU=0 --CKPT_PATH=YOUR_CKPT_PATH

The following script will start testing with the default hyperparameters:

$ python run.py RUN=test --GPU=0 --CKPT_PATH=YOUR_CKPT_PATH

You can find the test result in:

/results/result_test/result_run_<'VERSION+EPOCH'>.json

Result


In our expriments, the accuracy result on OK-VQA test set as follows:

Overall VT BCP OMC SR CF GHLC PEL PA ST WC Other
34.80 30.87 28.72 32.01 45.75 36.52 33.05 31.68 35.05 33.33 47.44 31.08

Here, we thanks so much these great works: mcan-vqa, huggingface and OK-VQA

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.5%
  • Shell 1.5%