You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your codes. I have some questions. I'd appreciate it if you could respond.
I want to understand the evaluation protocol used in the paper to obtain the FID and LPIPS scores (Table 1). For instance, the Animal Faces dataset has 119 classes for training and 30 classes for testing. I assume you train the model on 119 classes and then for testing you generate N number of images using the remaining 30 test classes' images as reference. Then, the FID can be computed between the N fake generated images and N real images sampled from 30 test classes. Could you please confirm whether or not this is correct? Also, what is the number of images used to calculate the FID i.e. what is N? I’d appreciate if you can share the evaluation scripts to obtain the FID and LPIPS scores so that we can replicate the results reported in the paper.
Table 1 shows that AGE achieves significant performance improvement over LoFGAN (ICCV21) and other baselines. It seems that the baseline numbers in Table 1 are taken from the LoFGAN paper. In the LoFGAN paper, it is mentioned that the authors used a reduced version of the original Flowers, Animal, and VGGface datasets. They have made public the reduced version of the datasets as .npy files in their official GitHub repository. For example, LoFGAN uses 100 images per class for the Animal Faces dataset. In total, it considers only 14,900 images for all 149 classes. The LoFGAN authors also confirmed the same thing in this GitHub issue (Questions regarding Animal Faces dataset. edward3862/LoFGAN-pytorch#12). To compare I have checked the animal_faces.zip file shared in this official repository. It seems that AGE uses the full dataset that contains significantly larger number of training images (total 117,574 images, ~750 images per class). I am wondering if you have tried any experiment to compute the scores on the same LoFGAN dataset to ensure a fair comparison?
I want to know the dimension of the images generated using AGE. Are they same as the input images’ dimension i.e. 256x256?
The text was updated successfully, but these errors were encountered:
Thanks for your codes. I have some questions. I'd appreciate it if you could respond.
I want to understand the evaluation protocol used in the paper to obtain the FID and LPIPS scores (Table 1). For instance, the Animal Faces dataset has 119 classes for training and 30 classes for testing. I assume you train the model on 119 classes and then for testing you generate N number of images using the remaining 30 test classes' images as reference. Then, the FID can be computed between the N fake generated images and N real images sampled from 30 test classes. Could you please confirm whether or not this is correct? Also, what is the number of images used to calculate the FID i.e. what is N? I’d appreciate if you can share the evaluation scripts to obtain the FID and LPIPS scores so that we can replicate the results reported in the paper.
Table 1 shows that AGE achieves significant performance improvement over LoFGAN (ICCV21) and other baselines. It seems that the baseline numbers in Table 1 are taken from the LoFGAN paper. In the LoFGAN paper, it is mentioned that the authors used a reduced version of the original Flowers, Animal, and VGGface datasets. They have made public the reduced version of the datasets as .npy files in their official GitHub repository. For example, LoFGAN uses 100 images per class for the Animal Faces dataset. In total, it considers only 14,900 images for all 149 classes. The LoFGAN authors also confirmed the same thing in this GitHub issue (Questions regarding Animal Faces dataset. edward3862/LoFGAN-pytorch#12). To compare I have checked the animal_faces.zip file shared in this official repository. It seems that AGE uses the full dataset that contains significantly larger number of training images (total 117,574 images, ~750 images per class). I am wondering if you have tried any experiment to compute the scores on the same LoFGAN dataset to ensure a fair comparison?
I want to know the dimension of the images generated using AGE. Are they same as the input images’ dimension i.e. 256x256?
The text was updated successfully, but these errors were encountered: