Hyper-parameters to reproduce language modelling results #21

ghazi-f · 2022-02-04T17:38:02Z

Thank you for this great repo !
I was trying to use it for language modeling but I couldn't find, amongst the checkpoints you provide, any model that performed well in terms of perplexity. I measure perplexity on your SNLI test set with code/examples/big_ae/run_lm_vae_training.py by setting the --do_eval option (and without the --do_train option). This yielded high KL (~2000) for all the checkpoints you provide.

I tried finetuning a wikipedia checkpoint with your script on SNLI but I only get the following results:

with high beta (1.0) and low r0 (0.1): perplexity in the order of 30 with KL around 10 and and mutual info ~0.2
with low beta (0.5) and high r0 (0.5): perplexity in the order of 1000 with KL around 75 and mutual info ~1.5

I can't seem to get it to have low perplexity with high mutual information. Could you provide a language modeling checkpoint or just specify the hyper-parameters and wikipedia pretrained model used to produce the results in the paper ?

Thank you very much for your help !

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyper-parameters to reproduce language modelling results #21

Hyper-parameters to reproduce language modelling results #21

ghazi-f commented Feb 4, 2022

Hyper-parameters to reproduce language modelling results #21

Hyper-parameters to reproduce language modelling results #21

Comments

ghazi-f commented Feb 4, 2022