Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using download script trying to guarantee the same splits for CelebA-HQ #8

Open
djburnett opened this issue Aug 16, 2024 · 0 comments

Comments

@djburnett
Copy link

While attempting to use scripts/download.sh to reproduce your dataset split on the CelebA-HQ 256x256 dataset, I had two issues in determining the splits used.

The first issue is the file temp_train_shuffled.flist was created using the shuf command, which produces random results each time it is run. As a result, lama-celeba/train_shuffled.flist and lama-celeba/val_shuffled.flist as generated when I ran the script cannot be guaranteed to be consistent with those used by the paper. Is it possible you can upload your generated versions of those two files, or your generated temp_train_shuffled.flist? This will allow me to be certain the dataset split is consistent in my experiments.

The second issue is the google drive link to download the dataset no longer works (the one that downloads data256x256.zip). While searching for the dataset on sources such as Kaggle, I noticed the images in the dataset often get reordered in different uploads. I am currently using a 256 version of CelebAMask-HQ, which indexes images consistently with the order of CelebA-HQ on TensorFlow.org (and the full size CelebAMask-HQ). Can you verify whether the ordering of the images seems to be consistent with that, or otherwise provide a valid URL for data256x256.zip from CelebA-HQ?


The reason I am concerned is from my understanding you made use of the training split to train the diffusion model, so if I wish to reuse that model I want to be certain I am not accidentally validating or testing my work using data from the diffusion model's training dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant