Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model request] Meta's SegmentAnything Model (SAM) #22592

Closed
2 tasks done
xenova opened this issue Apr 5, 2023 · 11 comments · Fixed by #22654
Closed
2 tasks done

[Model request] Meta's SegmentAnything Model (SAM) #22592

xenova opened this issue Apr 5, 2023 · 11 comments · Fixed by #22654

Comments

@xenova
Copy link
Contributor

xenova commented Apr 5, 2023

Model description

Meta Research recently open-sourced their "SegmentAnything Model" (SAM) for image segmentation. It would be great to have it working with this library's ImageSegmentationPipeline.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

GitHub repo: https://github.com/facebookresearch/segment-anything
Paper: https://ai.facebook.com/research/publications/segment-anything/
Website: https://segment-anything.com/
Demo: https://segment-anything.com/demo
Weights:

@inderpreetsingh01
Copy link

Hi @xenova @alaradirik I would like to work on adding this model.

@inderpreetsingh01
Copy link

@xenova I just checked the model website and I don't have the hardware resources to perform the model inference.

@xenova
Copy link
Contributor Author

xenova commented Apr 5, 2023

@xenova I just checked the model website and I don't have the hardware resources to perform the model inference.

Have you tried running it locally w/ python?

@Xrenya
Copy link
Contributor

Xrenya commented Apr 6, 2023

I think I can run it locally, I can work on it

@inderpreetsingh01
Copy link

Have you tried running it locally w/ python?

no, but i dont have gpu also I recently worked on adding seaformer model having 14M params, running it locally on cpu took a few seconds so this one with 632M params will take time and RAM.

@xenova
Copy link
Contributor Author

xenova commented Apr 10, 2023

I think I can run it locally, I can work on it

Great! How is it going? Let me know if you need any help.

@Xrenya
Copy link
Contributor

Xrenya commented Apr 17, 2023

@xenova I think this week I will finish it

@xvr-hlt
Copy link

xvr-hlt commented Apr 17, 2023

Hey @Xrenya, I'm pretty across these models and would love to get them into transformers so please reach out if I can help you in any way.

@alaradirik
Copy link
Contributor

Hi folks, please ignore this if you're already familiar with transformers but otherwise, you can refer to the guidelines to get started with adding a model. I'd recommend first checking you can run the original repo without any issues though.

Here are some summarized points that might help with model addition:

  • Each model, including different checkpoints of the same model, has it's own repo on the Hub (see DETR-ResNet-50 repo as an example). This is basically a git repo that stores the checkpoint specific configuration, preprocessing configuration and the model weights.
  • The code (PR) added to transformers acts as a boilerplate to load different checkpoints - target model trained on different datasets or with different resolution or larger / smaller architecture.
  • configuration_sam.py should contain all the hyperparameters, the input image size and architectural details (e.g. number of hidden layers) to initialize the model.
  • image_processing_sam.py should contain the ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.)
  • processing_sam.py wraps the CLIPTokenizer used by SAM for prompt encoding and SAMImageProcessor to a single processor class. You can refer to the OWL-ViT model to see how that works.
  • modeling_sam.py should contain the model definition.
  • The conversion script:
    • Loads the pretrained original model and randomly initializes the HF implementation with the corresponding configuration
    • Copies the pretrained parameters (weights and biases) of the original model to the corresponding parameters of the randomly initialized HF model (the conversion step)
    • Forward propagates an arbitrary input through both the original model and converted HF model and checks if the outputs match
    • Uploads the converted HF model to the hub
  • Each model and image processor class is tested with scripts under tests/models/<MODEL_NAME>/ , you can refer to other test files to see what tests to add.

Once you are done, you would need to run the following commands to check the PR passes all CI tests:

make style
make quality
make repo-consistency

RUN_SLOW=TRUE pytest tests/models/sam/test_modeling_sam.py
RUN_SLOW=TRUE pytest tests/models/sam/test_image_processor_sam.py
RUN_SLOW=TRUE pytest tests/models/sam/test_processor_sam.py

We can do an in-depth review once the PR passes most tests or the configuration, preprocessing and modeling is mostly complete.

Hope this helps!

@ArthurZucker
Copy link
Collaborator

PR for this model is available here, sorry for not catching this issue : #22654

@ArthurZucker ArthurZucker linked a pull request Apr 17, 2023 that will close this issue
@Xrenya
Copy link
Contributor

Xrenya commented Apr 17, 2023

@ArthurZucker I see, okay, next time I should push [WIP]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants