Why do we need to extract the language embeddings first? #51

redagavin · 2024-11-07T20:44:20Z

Hi, I don't know why we need to extract the language embeddings first. Doesn't it work like we give box prompts and images and then the model outputs segmentation masks and labels? Why we need language embeddings? How does it work with the model?

HarborYuan · 2024-11-18T07:57:52Z

Hi @redagavin ,

The language embeddings serve as a "dictionary." During the inference, a visual embedding will be extracted to match the "dictionary." You can refer to the code and paper for details. Please let me know if you have any other questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need to extract the language embeddings first? #51

Why do we need to extract the language embeddings first? #51

redagavin commented Nov 7, 2024

HarborYuan commented Nov 18, 2024

Why do we need to extract the language embeddings first? #51

Why do we need to extract the language embeddings first? #51

Comments

redagavin commented Nov 7, 2024

HarborYuan commented Nov 18, 2024