Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ViT + CLIP #143

Closed
gboduljak opened this issue Dec 18, 2023 · 5 comments · Fixed by #315
Closed

ViT + CLIP #143

gboduljak opened this issue Dec 18, 2023 · 5 comments · Fixed by #315
Labels
enhancement New feature or request

Comments

@gboduljak
Copy link
Contributor

Would it be worth implementing ViT and CLIP example?

@awni
Copy link
Member

awni commented Dec 18, 2023

Yea that ones on our list of examples to add! Are you interested in contributing it? If so which model would you use?

@awni awni added the enhancement New feature or request label Dec 18, 2023
@gboduljak
Copy link
Contributor Author

gboduljak commented Dec 18, 2023

Yea that ones on our list of examples to add! Are you interested in contributing it? If so which model would you use?

I would like to contribute :) However, I would like to complete the implementation of norm first (ml-explore/mlx#187). I would use models from the official CLIP repository: https://github.com/openai/CLIP. If you have an alternative idea, please let me know.

This was referenced Jan 11, 2024
@nkasmanoff
Copy link
Contributor

@gboduljak I submitted a PR to your existing PR, which creates a local implementation of the CLIPImageProcessor. gboduljak#1

This should eliminate the dependency on transformers, aside from using it for downloading the model & tokenizer.

@gboduljak
Copy link
Contributor Author

@nkasmanoff Thanks for the help. I will take a look at your work now.

@gboduljak
Copy link
Contributor Author

@nkasmanoff I merged your PR, corrected the nits and I refactored your implementation so that everything is in preprocessing folder. Many thanks for the help. In future, we might drop this 'copy-paste' implementation from HuggingFace. Ideally, we should use mlx-data. If you have time, it would be awesome to have mlx-data implementation of CLIPImageProcessor.

@awni awni closed this as completed in #315 Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants