Skip to content

Commit 9777f43

Browse files
authored
Update README.md
1 parent b222963 commit 9777f43

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,16 @@ For the training and validation sets, we used the official splits provided by th
5757

5858
### Data Preprocessing
5959
According to the paper we are implementing, we will have a policy to choose the most suitable answer out of the 10 answers to train the model on that answer, the policy for building a vocabulary is as follows:
60+
6061
1- Choose the most frequent answer out of the 10 answers
62+
6163
2- If there is a tie, we will choose the most frequent one in the entire set of all answers
64+
6265
3- If there is a tie, we will choose the answer with the minimum Levenshtein distance to all tied answers
6366

6467
We also need to one hot encode the answers, so we will have a vector of size 5410, where 5410 is the size of the vocabulary. We will use the [One Hot Encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) from the [scikit-learn](https://scikit-learn.org/stable/) library to one hot encode the answers and answer type.
6568

66-
Instead of using lazy processing and extract question and answer embeddings on the fly, we will extract them beforehand and save them in a pickle file. We will use the [CLIP](https://openai.com/blog/clip/) model to extract the image and questions embeddings.
69+
Instead of using lazy processing and extracting question and answer embeddings on the fly, we will extract them beforehand and save them in a pickle file. We will use the [CLIP](https://openai.com/blog/clip/) model to extract the image and questions embeddings.
6770

6871
## Model Architecture
6972
We will follow the architecture mentioned in [Less Is More](https://arxiv.org/abs/2206.05281) paper. The architecture goes as follows:

0 commit comments

Comments
 (0)