Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which dataset was used to train EVF-SAM2? #45

Open
iseunghoon opened this issue Feb 2, 2025 · 4 comments
Open

Which dataset was used to train EVF-SAM2? #45

iseunghoon opened this issue Feb 2, 2025 · 4 comments

Comments

@iseunghoon
Copy link

Which dataset was used to train EVF-SAM2?

@CoderZhangYx
Copy link
Collaborator

Same as EVF-SAM1, we used RefCOCO/+/g, ADE20K, Objects365(filtered & machine annotated), PartImageNet, Humanparsing, Pascal-part.

@yunjeongch
Copy link

Hi,

I’m a bit confused about the training scheme of EVF-SAM2.

From what I see in the code, the EvfSam2Model class is implemented in both evf_sam2.py and evf_sam2_video.py. As far as I understand, the difference between them lies in the visual model: SAM2Base in evf_sam2.py and SAM2VideoPredictor in evf_sam2_video.py.

Did you train the evf_sam2.py version with SAM2Base using the aforementioned image dataset and then perform inference with evf_sam2_video.py using the trained parameters?

Thanks in advance!

@CoderZhangYx
Copy link
Collaborator

Yes, you are right. Both SAM2Base and SAM2VideoPredictor are wrappers of SAM2 components (image encoder + prompt encoder + mask decoder). We train EVF-SAM2 using image datasets by keeping all SAM2 params frozen, then the model is able to perform zero-shot video prediction.

@yunjeongch
Copy link

Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants