CLIP Text Encoder #1969

calvinpelletier · 2024-11-07T23:23:35Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Implement the CLIP tokenizer
Implement the CLIP text encoder

Test plan

Minimal code to run the CLIP text encoder e2e:
(first download CLIP weights: tune download openai/clip-vit-large-patch14 --output-dir /tmp/clip-vit-large-patch14 --ignore-patterns None)

import torch

from torchtune.models.clip._model_builders import clip_tokenizer, clip_text_encoder_large
from torchtune.training.checkpointing._checkpointer import FullModelHFCheckpointer

# build tokenizer and text encoder
tokenizer = clip_tokenizer()
encoder = clip_text_encoder_large()

# load weights
encoder.load_state_dict(FullModelHFCheckpointer(
    "/tmp/clip-vit-large-patch14",
    ["model.safetensors"],
    "CLIP_TEXT",
    "/tmp/torchtune-clip-vit-large-patch14",
).load_checkpoint()["model"])
encoder = encoder.to(torch.bfloat16).cuda().eval()

# run
text = [
    "a cow jumping over the moon",
    "a helpful AI assistant",
]
tokens = tokenizer(text)
encoding = encoder(tokens.cuda())

Checked parity with the HF CLIP tokenizer and text encoder as implemented here: MSE between the encoder outputs for on a batch of 32 test strings = 3.55e-5

Tokenization speed for 32 test strings

OpenAI: 0.0342
HuggingFace: 0.0416
TorchTune: 0.0195

Tokenization speed for an entire 100k img gen prompt dataset:

HuggingFace: 18.02
TorchTune: 6.89

Encoding speed for a single batch of 32 test strings:

HuggingFace: 1.699
TorchTune: 0.0173

Encoding speed for 1000 batches of 32 test strings:

HuggingFace: 30.498
TorchTune: 11.713

Checklist

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
- TODO: unit test for encoder
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-11-07T23:23:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1969

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit c215690 with merge base fcd400f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

tests/torchtune/models/clip/test_clip_tokenizer.py

torchtune/models/clip/_model_builders.py

torchtune/models/clip/_text_encoder.py

torchtune/models/clip/_tokenizer.py

codecov-commenter · 2024-11-07T23:48:44Z

Codecov Report

Attention: Patch coverage is 92.30769% with 17 lines in your changes missing coverage. Please review.

Project coverage is 65.08%. Comparing base (1814feb) to head (c215690).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/models/clip/_convert_weights.py	36.36%	7 Missing ⚠️
torchtune/models/clip/_tokenizer.py	93.45%	7 Missing ⚠️
torchtune/models/clip/_model_builders.py	87.50%	1 Missing ⚠️
torchtune/models/clip/_text_encoder.py	96.42%	1 Missing ⚠️
torchtune/training/checkpointing/_checkpointer.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1969      +/-   ##
==========================================
- Coverage   67.29%   65.08%   -2.21%     
==========================================
  Files         318      323       +5     
  Lines       17646    17848     +202     
==========================================
- Hits        11874    11616     -258     
- Misses       5772     6232     +460

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

torchtune/models/clip/_tokenizer.py

pbontrager

This is a great PR! I left some comments around some standard patterns we try to follow but aside from that, this looks very solid and clean.

torchtune/models/clip/_tokenizer.py

torchtune/modules/activations.py

torchtune/training/checkpointing/_checkpointer.py

torchtune/utils/_download.py

torchtune/models/clip/_tokenizer.py

torchtune/models/clip/_model_builders.py

tests/torchtune/models/clip/test_clip_tokenizer.py

pbontrager

I'd still like the model type to be unified to just CLIP to keep the list shorter. But I'll approve when @RdoubleA signs off on the tokenizer.

torchtune/models/clip/_model_builders.py

torchtune/models/clip/_text_encoder.py

torchtune/modules/activations.py

torchtune/training/checkpointing/_checkpointer.py

torchtune/models/clip/_tokenizer.py

torchtune/models/clip/_component_builders.py

torchtune/models/clip/_model_builders.py

RdoubleA · 2024-11-20T15:04:36Z

torchtune/models/clip/_text_encoder.py

+
+class CLIPTextEncoder(nn.Module):
+    """
+    Text encoder for CLIP.


mind briefly describing the architecture? probably a normal transformer, but uses a different MLP activation?

it's just a normal transformer. the MLP activation is still GELU, it's just a faster and less-precise version of it (tho not actually faster these days, this is a relic of the ancient year 2021)

torchtune/models/clip/_text_encoder.py

pbontrager

This looks good now, thanks for doing all the changes! I think you could add a few line description of the models in the docstrings where Rafi commented along with an arxiv link. But I'll approve it now.

RdoubleA · 2024-11-20T21:53:48Z

torchtune/models/clip/_text_encoder.py

        # [b, s, d] -> [b, d]
-        # TODO: handle the case when the EOS token is not the highest token ID
-        eos_token_positions = tokens.argmax(dim=-1)
+        eos_token_positions = (tokens == self.eot_token).int().argmax(dim=-1)


can do (tokens == self.eot_token).nonzero()

we don't want all positions of the eot token, just the first one (argmax gives the first position where they match)

calvinpelletier added 3 commits November 7, 2024 13:16

CLIP tokenizer and text encoder

f40d879

Merge remote-tracking branch 'origin/main' into clip_text

5dcf0d0

formatting

0a070af

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024

calvinpelletier commented Nov 7, 2024

View reviewed changes

calvinpelletier requested a review from pbontrager November 7, 2024 23:46

calvinpelletier commented Nov 8, 2024

View reviewed changes

torchtune/models/clip/_tokenizer.py Outdated Show resolved Hide resolved

pbontrager reviewed Nov 11, 2024

View reviewed changes

calvinpelletier added 6 commits November 11, 2024 13:54

switching to hf vocab file

8334463

remove dependency on ftfy

d43f5b0

clip text encoder unit test

a3c90ac

address comments

1dbe939

move __call__

e6b3d19

merge

d501903

pbontrager reviewed Nov 14, 2024

View reviewed changes

torchtune/models/clip/_model_builders.py Show resolved Hide resolved

torchtune/models/clip/_text_encoder.py Outdated Show resolved Hide resolved

torchtune/modules/activations.py Outdated Show resolved Hide resolved

torchtune/training/checkpointing/_checkpointer.py Show resolved Hide resolved

calvinpelletier added 3 commits November 14, 2024 11:17

addressing comments

c4e700b

Merge remote-tracking branch 'origin/main' into clip_text

d5b7f98

moving quickgelu

5aa7c9f

RdoubleA reviewed Nov 14, 2024

View reviewed changes

calvinpelletier added 4 commits November 18, 2024 14:33

Merge remote-tracking branch 'origin/main' into clip_text

c914aa0

addressing comments and making tokenizer more efficient

69a5a16

type hints

5fe86ae

tokenizer __call__

4c6ef70

RdoubleA approved these changes Nov 20, 2024

View reviewed changes

calvinpelletier added 2 commits November 20, 2024 10:43

addressing comments

3baea1c

Merge remote-tracking branch 'origin/main' into clip_text

bc867ab

pbontrager approved these changes Nov 20, 2024

View reviewed changes

configurable eot token

10c1b0d

RdoubleA reviewed Nov 20, 2024

View reviewed changes

calvinpelletier added 2 commits November 20, 2024 13:56

docstring

ec75cae

fix unit test

c215690

calvinpelletier merged commit 89f935f into pytorch:main Nov 20, 2024
17 checks passed

ebsmothers mentioned this pull request Nov 26, 2024

v0.5.0 tracker #2008

Closed

44 tasks

calvinpelletier deleted the clip_text branch December 8, 2024 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP Text Encoder #1969

CLIP Text Encoder #1969

calvinpelletier commented Nov 7, 2024 •

edited

Loading

pytorch-bot bot commented Nov 7, 2024 •

edited

Loading

codecov-commenter commented Nov 7, 2024 •

edited

Loading

pbontrager left a comment

pbontrager left a comment

RdoubleA Nov 20, 2024

calvinpelletier Nov 20, 2024 •

edited

Loading

pbontrager left a comment

RdoubleA Nov 20, 2024

calvinpelletier Nov 20, 2024 •

edited

Loading

CLIP Text Encoder #1969

CLIP Text Encoder #1969

Conversation

calvinpelletier commented Nov 7, 2024 • edited Loading

Context

Changelog

Test plan

UX

pytorch-bot bot commented Nov 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1969

❗ 1 Active SEVs

✅ No Failures

codecov-commenter commented Nov 7, 2024 • edited Loading

Codecov Report

pbontrager left a comment

Choose a reason for hiding this comment

pbontrager left a comment

Choose a reason for hiding this comment

RdoubleA Nov 20, 2024

Choose a reason for hiding this comment

calvinpelletier Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

pbontrager left a comment

Choose a reason for hiding this comment

RdoubleA Nov 20, 2024

Choose a reason for hiding this comment

calvinpelletier Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

calvinpelletier commented Nov 7, 2024 •

edited

Loading

pytorch-bot bot commented Nov 7, 2024 •

edited

Loading

codecov-commenter commented Nov 7, 2024 •

edited

Loading

calvinpelletier Nov 20, 2024 •

edited

Loading

calvinpelletier Nov 20, 2024 •

edited

Loading