Paper Title: "Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning"

Paper Link
Github Repo
Code is available in pytorch. We don't want you to port the code from pytorch to tensorflow!
Implement your code in tf.eager, use tf.keras.Model to build components of your model, as shown in RNN notebook
Deadline to Submit Project: 12 Oct 2018 (Friday), 01 PM
We will announce submission instructions later...

Project Details

Tasks to be implemented:

NLI
Constituency Parsing
NMT on En-De

Training Flow

Train Model for 1 epoch on Task 1, save checkpoints which give best results on corresponding dev file
Start from saved checkpoints for Task 1. Train with Task 2, save checkpoints
Repeat for next task

Speeding up data pipeline

See tips for speeding up data pipeline here Specifically use multiple threads and prefetching!

Model details

RNN Cell: Use only unidirectional GRU
Restrict SRC Vocab to 30,000 words. Pick most frequently occuring words across all datasets
Use word embedding dimension as 256 and GRU cell dimension as 512.
Compare your models with and without dropout (0.3)
Use Adam optimizer with learning rate of 0.002
Use batch size of 32
Rest of the parameters should be used as mentioned in

Evaluation Tasks

10.1, 10.2, 10.3, 10.4

Guidelines

Do not directly run your experiments on GPU. Verify if you model works by first trying to overfit it on a small portion of training data on your local system!
Save model checkpoints every hour or so. Checkpoints will allow you to resume your work in case your training job gets killed!
Use a fixed random seed for your experiments.

What you need to submit

Prepare a single zip which contains all the following:

Code for data pipeline for your model (tf.data). Add comments for your code!
Code for your model. Add comments for your code!
Source Vocabulary file

Training logs for your 2 final models: with dropout, without dropout

When running your training, don't use prints but tf.logging
It is set as follows, in the beginning of your main program:

   logging = tf.logging
   logging.set_verbosity(logging.INFO)

   def log_msg(msg):
       logging.info(f'{time.ctime()}: {msg}')

Now, you can print as follows:

log_msg(f'Epoch: {epoch_num} Step: {step_num} ppl improved: {ppl: 0.4f}')

5 nearest neighbors for each word in your source vocabulary, computed using Embedding. Note, we want words and not their integer indexes! File should look as follows:
```
   word1, neighbor1, neighbor2, neighbor3, neighbor4, neighbor5
   word2, ....
   ....
   word30000, ....
```
Hint Use: sklearn cosine similarity to find NNs
Time taken to run 1 epoch for each task on CPU and GPU. How did you make this faster?
Scores for all the evaluation tasks

Bonus

Compare your model on evaluation tasks with Universal Sentence Encoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project1.md

project1.md

Project Details

Tasks to be implemented:

Training Flow

Speeding up data pipeline

Model details

Evaluation Tasks

Guidelines

What you need to submit

Bonus

Files

project1.md

Latest commit

History

project1.md

File metadata and controls

Project Details

Tasks to be implemented:

Training Flow

Speeding up data pipeline

Model details

Evaluation Tasks

Guidelines

What you need to submit

Bonus