Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text encoder #6

Open
renderless opened this issue May 27, 2022 · 1 comment
Open

Text encoder #6

renderless opened this issue May 27, 2022 · 1 comment

Comments

@renderless
Copy link

Thank you for your awesome work. Do you have plan to release pretrained text encoder?

@yinglinzheng
Copy link
Member

Hi, thanks for your attention.
The pretrained backbones we released contain the weights of the text encoder. In fact, you can load the weights of FaRL using exactly the same network structure as CLIP VIT-B16, and use it exactly like CLIP. Here I show the example modified from CLIP.

import torch
import clip
from PIL import Image

device ="cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/16", device="cpu")
model = model.to(device)
farl_state=torch.load("FaRL-Base-Patch16-LAIONFace20M-ep16.pth") # you can download from https://github.com/FacePerceiver/FaRL#pre-trained-backbones
model.load_state_dict(farl_state["state_dict"],strict=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants