this is a paddle repo of Generative Adversarial Text to Image Synthesis
English | 简体中文
- [Paddle_T2I]
This project replicates T2I_GAN, the first conditional GAN for text-to-image synthesis tasks, based on the paddlepaddle framework. given a text description, the model is able to understand the meaning of the text and synthesize a semantic image
Paper:
- [1] Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[C]//International Conference on Machine Learning. PMLR, 2016: 1060-1069.
Reference project:
Online Project:
The acceptance criteria for this project is to evaluate the images generated on the Oxford-102 dataset with the human eye, so there are no specific quantitative metrics and only synthetic samples are shown
Dataset | Paddle_T2I | Text_to_Image_Synthesis |
---|---|---|
[Oxford-102] |
Oxford-102
This dataset was provided by text-to-image-synthesis.The dataset has been converted to hd5 format for faster reading.The datasets are downloaded and saved in: Data\
If you want to convert the data format yourself, you can follow the steps below (actually the format of the data storage is changed, the information of the data itself remains unchanged and no feature extraction is performed by the neural network).
- Download dataset:flowers
- Add the path to the dataset to
config.yaml
- Run
convert_flowers_to_hd5_script.py
to convert the dataset storage format
There are three subsets under the whole dataset, namely "train", "valid" and "test". Each subset contains 5 types of data (Note: the text embedding vector is provided by the author of the paper, which has been converted from string form to vector form, and this part of data is included in the above downloaded dataset)
- File Name
name
- Image
img
- Text embeddings
embeddings
- class of image
class
- text description of image
txt
- Train+Validation:8192
- Test:800
- Number of text per image:5
- Data format: flower images and the corresponding text data set of the images
-
Hardware:GPU、CPU
-
Framwork:
- PaddlePaddle >= 2.0.0
git clone https://github.com/Caimthefool/Paddle_T2I.git
cd Paddle_T2I
python main.py --split=0
Save the parameters of the model in model\
, then change the value of pretrain_model, and then run the following command to save the output image in the image\
directory
python main.py --validation --split=2 --pretrain_model=model/netG.pdparams
Place the files to be tested in the directory determined by the parameter pretrain_model, run the following command, and save the output images in the image\
directory
python main.py --validation --split=2 --pretrain_model=model/netG.pdparams
Because the acceptance of this project is through the human eye to observe the image, i.e. user_study, the evaluation is the same way as the prediction
├─Data
├─Log
├─examples
├─image
├─model
├─sample
| T2IDataset.py
| config.yaml
│ convert_flowers_to_hd5_script.py
│ README.md
│ README_cn.md
│ discriminator.py
| generator.py
│ trainer.py
| main.py
| requirement.txt
Training and evaluation-related parameters can be set in main.py
, as follows.
Parameters | default | description | other |
---|---|---|---|
config | None, Mandatory | Configuration file path | |
--split | 0, Mandatory | Data set splitused | 0 represents the training set, 1 represents the validation set, 2 represents the test set |
--validation | false, Optional | Prediction and evaluation | |
--pretrain_model | None, Optional | Pre-trained model path |
python main.py --split=0
When training is executed, the output will look like the following. Each round of batch
training will print the current epoch, step, and loss values.
Epoch: [1 | 600]
(1/78) Loss_D: 1.247 | Loss_G: 20.456 | D(X): 0.673 | D(G(X)): 0.415
Our pre-trained model is already included in this repo, in the model directory
python main.py --validation --split=2 --pretrain_model=model/netG.pdparams
For other information about the model, please refer to the following table:
information | description |
---|---|
Author | weiyuan zeng |
Date | 2021.09 |
Framework version | Paddle 2.0.2 |
Application scenarios | Text-to-Image Synthesis |
Support hardware | GPU、CPU |
visualdl --logdir Log --port 8080
Dataset | Paddle_T2I | Text_to_Image_Synthesis |
---|---|---|
[Oxford-102] |