This is the pytorch implementation of the paper "Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness"
Our code is mainly based on this public code. Very thanks for its authors.
- Python >= 3.6
- Pytorch >= 1.5.0
Datastes used in this paper can be downloaded in this link, with the specific license if that is not based on MIT License.
Example script to train DU-VAE on text data:
python text.py --dataset yelp \
--device cuda:0 \
--gamma 0.5 \
--p_drop 0.2 \
--delta_rate 1 \
--kl_start 0 \
--warm_up 10
Example script to train DU-VAE on image data:
python3.6 image.py --dataset omniglot \
--device cuda:3 \
--kl_start 0 \
--warm_up 10 \
--gamma 0.5 \
--p_drop 0.1 \
--delta_rate 1 \
--dataset omniglot
Example script to train DU-IAF, a variant of DU-VAE, on text data:
python3.6 text_IAF.py --device cuda:2 \
--dataset yelp \
--gamma 0.6 \
--p_drop 0.3 \
--delta_rate 1 \
--kl_start 0 \
--warm_up 10 \
--flow_depth 2 \
--flow_width 60
Example script to train DU-IAF on image data:
python3.6 image_IAF.py --dataset omniglot\
--device cuda:3 \
--kl_start 0 \
--warm_up 10 \
--gamma 0.5 \
--p_drop 0.15\
--delta_rate 1 \
--flow_depth 2\
--flow_width 60
Here,
-
--dataset
specifies the dataset name, currently it supportssynthetic
,yahoo
,yelp
fortext.py
andomniglot
forimage.py
. -
--kl_start
represents starting KL weight (set to 1.0 to disable KL annealing) -
--warm_up
represents number of annealing epochs (KL weight increases fromkl_start
to 1.0 linearly in the firstwarm_up
epochs) -
--gamma
represents the parameter$\gamma$ in our Batch-Normalization approach, which should be more than 0 to use our model. -
--p_drop
represents the parameter$1-p$ in our Dropout approach, which denotes the percent of data to be ignored and should be ranged in (0,1). -
--delta_rate
represents the hyper-parameter$\alpha$ to controls the min value of the variance$\delta^2$ -
--flow_depth
represents number of MADE layers used to implement DU-IAF. -
--flow_wdith
controls the hideen size in each IAF block, where we set the product between the value and the dimension of$z$ as the hidden size. For example, when we set--flow width 60
with the dimension of$z$ as 32, the hidden size of each IAF block is 1920.
If you find our methods or code helpful, please kindly cite the paper:
@inproceedings{shen2021regularizing,
title={Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness},
author={Shen, Dazhong and Qin, Chuan and Wang, Chao and Zhu, Hengshu and Chen, Enhong and Xiong, Hui},
booktitle={Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21)},
year={2021}
}