Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on evaluation protocol, datasets and image dimension of AGE #6

Open
ankanbhunia opened this issue Jun 16, 2022 · 0 comments

Comments

@ankanbhunia
Copy link

Thanks for your codes. I have some questions. I'd appreciate it if you could respond.

  1. I want to understand the evaluation protocol used in the paper to obtain the FID and LPIPS scores (Table 1). For instance, the Animal Faces dataset has 119 classes for training and 30 classes for testing. I assume you train the model on 119 classes and then for testing you generate N number of images using the remaining 30 test classes' images as reference. Then, the FID can be computed between the N fake generated images and N real images sampled from 30 test classes. Could you please confirm whether or not this is correct? Also, what is the number of images used to calculate the FID i.e. what is N? I’d appreciate if you can share the evaluation scripts to obtain the FID and LPIPS scores so that we can replicate the results reported in the paper.

  2. Table 1 shows that AGE achieves significant performance improvement over LoFGAN (ICCV21) and other baselines. It seems that the baseline numbers in Table 1 are taken from the LoFGAN paper. In the LoFGAN paper, it is mentioned that the authors used a reduced version of the original Flowers, Animal, and VGGface datasets. They have made public the reduced version of the datasets as .npy files in their official GitHub repository. For example, LoFGAN uses 100 images per class for the Animal Faces dataset. In total, it considers only 14,900 images for all 149 classes. The LoFGAN authors also confirmed the same thing in this GitHub issue (Questions regarding Animal Faces dataset. edward3862/LoFGAN-pytorch#12). To compare I have checked the animal_faces.zip file shared in this official repository. It seems that AGE uses the full dataset that contains significantly larger number of training images (total 117,574 images, ~750 images per class). I am wondering if you have tried any experiment to compute the scores on the same LoFGAN dataset to ensure a fair comparison?

  3. I want to know the dimension of the images generated using AGE. Are they same as the input images’ dimension i.e. 256x256?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant