Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLIP feature #6

Open
rongtongxueya opened this issue Jul 8, 2023 · 6 comments
Open

CLIP feature #6

rongtongxueya opened this issue Jul 8, 2023 · 6 comments

Comments

@rongtongxueya
Copy link

Thank you for sharing such great code. But I have a question, how did you extract the features of the flickr dataset? I want to change this data to another dataset to see the effect, but I don't know how you extracted the features, could you please give me some advice

@xu-shitong
Copy link
Owner

Example of extracting Flickr8k clip features are given in https://github.com/xu-shitong/flickr8k-CLIP-freature/blob/master/building/datatensor_creator.py . If you wish to try on another dataset, you need an encoder, like the clip model, to extract samples feature vectors as input to the diffusion model.

@Orange1999
Copy link

Example of extracting Flickr8k clip features are given in https://github.com/xu-shitong/flickr8k-CLIP-freature/blob/master/building/datatensor_creator.py . If you wish to try on another dataset, you need an encoder, like the clip model, to extract samples feature vectors as input to the diffusion model.

Hello, I have identified some issues in the code above. It appears that the code retrieves the clip feature image using the line image, text = train_dataset[i], while the getitem function in the Flickr8kCLIPDataset states that it should return outputs.text_embeds and outputs.image_embeds. In this case, it seems that the image and text features have been swapped in the stored feature files. As a result, the text features have been erroneously being used during the inference stage.

@xu-shitong
Copy link
Owner

Yes, you are right... The code is just an example of how to extract features. Different code is used to extract Flickr30k dataset feature.
But true, the hyperparameter tuning part might be influenced by the problem, as I don't remember if a different code is used to extract the data used in training.

@Orange1999
Copy link

Yes, you are right... The code is just an example of how to extract features. Different code is used to extract Flickr30k dataset feature. But true, the hyperparameter tuning part might be influenced by the problem, as I don't remember if a different code is used to extract the data used in training.

Thank you for your responses and contributions. Based on your input, I have utilized the provided code to extract feature files from the Flickr30k dataset. Subsequently, I trained the model using the Flickr30+8k dataset, resulting in a notable increase in the Bleu-4 score (30.7). This outcome appears to deviate significantly from the results mentioned in the paper. To rectify this disparity, I intend to reference your code once more to re-extract the features and retrain the model.

@gWeiXP
Copy link

gWeiXP commented Mar 6, 2024

Example of extracting Flickr8k clip features are given in https://github.com/xu-shitong/flickr8k-CLIP-freature/blob/master/building/datatensor_creator.py . If you wish to try on another dataset, you need an encoder, like the clip model, to extract samples feature vectors as input to the diffusion model.

Hello, in datatensor_creator.py, if I want to get image_all_final.pickle instead of image_all_40.pickle, do I just need to change the value of 'start' from 40000 to 0? The code you provided doesn't seem to be complete.

@xu-shitong
Copy link
Owner

Yes, that's correct. The code is written in this way only because my machine was not able to extract all the features for the dataset in one go, so I had to manually restrict the program to extract a subset of samples' feature, and combine the features later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants