You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this great repo !
I was trying to use it for language modeling but I couldn't find, amongst the checkpoints you provide, any model that performed well in terms of perplexity. I measure perplexity on your SNLI test set with code/examples/big_ae/run_lm_vae_training.py by setting the --do_eval option (and without the --do_train option). This yielded high KL (~2000) for all the checkpoints you provide.
I tried finetuning a wikipedia checkpoint with your script on SNLI but I only get the following results:
with high beta (1.0) and low r0 (0.1): perplexity in the order of 30 with KL around 10 and and mutual info ~0.2
with low beta (0.5) and high r0 (0.5): perplexity in the order of 1000 with KL around 75 and mutual info ~1.5
I can't seem to get it to have low perplexity with high mutual information. Could you provide a language modeling checkpoint or just specify the hyper-parameters and wikipedia pretrained model used to produce the results in the paper ?
Thank you very much for your help !
The text was updated successfully, but these errors were encountered:
Thank you for this great repo !
I was trying to use it for language modeling but I couldn't find, amongst the checkpoints you provide, any model that performed well in terms of perplexity. I measure perplexity on your SNLI test set with
code/examples/big_ae/run_lm_vae_training.py
by setting the --do_eval option (and without the --do_train option). This yielded high KL (~2000) for all the checkpoints you provide.I tried finetuning a wikipedia checkpoint with your script on SNLI but I only get the following results:
I can't seem to get it to have low perplexity with high mutual information. Could you provide a language modeling checkpoint or just specify the hyper-parameters and wikipedia pretrained model used to produce the results in the paper ?
Thank you very much for your help !
The text was updated successfully, but these errors were encountered: