Paper Title: "Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning"
-
Paper Link
-
Code is available in pytorch. We don't want you to port the code from pytorch to tensorflow!
-
Implement your code in tf.eager, use tf.keras.Model to build components of your model, as shown in RNN notebook
-
Deadline to Submit Project: 12 Oct 2018 (Friday), 01 PM
-
We will announce submission instructions later...
- NLI
- Constituency Parsing
- NMT on En-De
- Train Model for 1 epoch on Task 1, save checkpoints which give best results on corresponding dev file
- Start from saved checkpoints for Task 1. Train with Task 2, save checkpoints
- Repeat for next task
See tips for speeding up data pipeline here Specifically use multiple threads and prefetching!
- RNN Cell: Use only unidirectional GRU
- Restrict SRC Vocab to 30,000 words. Pick most frequently occuring words across all datasets
- Use word embedding dimension as 256 and GRU cell dimension as 512.
- Compare your models with and without dropout (0.3)
- Use Adam optimizer with learning rate of 0.002
- Use batch size of 32
- Rest of the parameters should be used as mentioned in
10.1, 10.2, 10.3, 10.4
- Do not directly run your experiments on GPU. Verify if you model works by first trying to overfit it on a small portion of training data on your local system!
- Save model checkpoints every hour or so. Checkpoints will allow you to resume your work in case your training job gets killed!
- Use a fixed random seed for your experiments.
Prepare a single zip which contains all the following:
-
Code for data pipeline for your model (tf.data). Add comments for your code!
-
Code for your model. Add comments for your code!
-
Source Vocabulary file
-
Training logs for your 2 final models: with dropout, without dropout
- When running your training, don't use prints but tf.logging
- It is set as follows, in the beginning of your main program:
logging = tf.logging logging.set_verbosity(logging.INFO) def log_msg(msg): logging.info(f'{time.ctime()}: {msg}')
- Now, you can print as follows:
log_msg(f'Epoch: {epoch_num} Step: {step_num} ppl improved: {ppl: 0.4f}')
-
5 nearest neighbors for each word in your source vocabulary, computed using Embedding. Note, we want words and not their integer indexes! File should look as follows:
word1, neighbor1, neighbor2, neighbor3, neighbor4, neighbor5 word2, .... .... word30000, ....
Hint Use: sklearn cosine similarity to find NNs
-
Time taken to run 1 epoch for each task on CPU and GPU. How did you make this faster?
-
Scores for all the evaluation tasks
Compare your model on evaluation tasks with Universal Sentence Encoder