LoRA Builders for MM #1661

pbontrager · 2024-09-24T04:47:05Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

CLIP lora encoder + lora mlp + lora attention
Flamingo lora encoder + decoder + projection head
Llama 3.1 updated shared lora util
TODO: update lora recipes to match full finetune updates

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-09-24T04:47:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1661

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 72b0139 with merge base 34d70b4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchtune/models/clip/_component_builders.py

felipemello1 · 2024-09-24T16:15:48Z

torchtune/models/flamingo/_component_builders.py

@@ -197,21 +527,34 @@ def flamingo_decoder(
    for idx in range(1, num_layers + 1):

        # Self attention layers for text decoder
-        rope = RotaryPositionalEmbeddings(dim=head_dim, max_seq_len=max_seq_len, base=rope_base)
-        self_attn = MultiHeadAttention(
+        rope = Llama3ScaledRoPE(dim=head_dim, max_seq_len=max_seq_len, base=rope_base)


rope should be instantiated only once, ouside of the for loop, to avoid copies and extra memory

joecummings · 2024-09-24T19:21:30Z

recipes/lora_finetune_single_device.py

@@ -282,10 +284,12 @@ def setup(self, cfg: DictConfig) -> None:

        # Dataloader depends on the tokenizer and loss_fn and should be
        # setup after all of these are setup
+        collate_name = cfg.get("collate_fn", "torchtune.data.padded_collate_sft")


Why is padded_collate_sft the default? I thought that didn't work

That's the standard collate for finetuning text models.

torchtune/models/clip/_component_builders.py

ebsmothers · 2024-09-24T21:20:10Z

recipes/lora_finetune_distributed.py

@@ -545,15 +555,20 @@ def _setup_data(

        if isinstance(cfg_dataset, ListConfig):
            datasets = [
-                config.instantiate(single_cfg_dataset, tokenizer=self._tokenizer)
+                config.instantiate(single_cfg_dataset, self._tokenizer)


This doesn't need to be keyword? Edit: oh I guess not for the builder versions, just if you're actually passing SFTDataset directly (which I guess we won't support?)

Some datasets call it tokenizer and others call it transforms

ebsmothers · 2024-09-24T21:33:13Z

torchtune/models/clip/_component_builders.py

+    activation: Callable = nn.SiLU,
+    cls_output_dim: int = 512,
+    attn_bias: bool = True,
+    out_indices: Optional[List[int]] = None,
+    output_cls_projection: bool = False,
+    max_num_tiles: int = 4,
+    in_channels: int = 3,
+    intermediate_act: torch.nn.Module = torch.nn.SiLU(),


Looking at L171 and L178 makes me sad

ebsmothers · 2024-09-24T21:34:29Z

torchtune/models/clip/_component_builders.py

+def lora_clip_vision_encoder(
+    lora_modules: List[LORA_ATTN_MODULES],
+    apply_lora_to_mlp: bool = False,
+    apply_lora_to_output: bool = False,


Do we just need to pass this for consistency even though it's a no-op? Would maybe raise an error if it's set to true or something

i like consistency, agree with raising an error saying that it is a no-op/not supported. We do it for all tied embedding models that dont support apply_lora_to_output

What can't they be true?

ebsmothers · 2024-09-24T21:36:59Z

torchtune/models/flamingo/_encoder.py

        output: nn.Module,
        num_hidden_inputs: int = 0,
    ) -> None:
        super().__init__()
-        self.layers = _get_clones(layer, num_layers)
+        self.layers = nn.ModuleList(layers)


Shouldn't layers be List[nn.Module] type then?

please look at the transformer, that supports all cases (List[nn.Module], nn.ModuleList, nn.module:

torchtune/torchtune/modules/transformer.py

Line 353 in 30b8519

if isinstance(layers, nn.ModuleList):

I am fine if in flamingo it only supports nn.ModuleList though, but i prefer the consistency

ebsmothers · 2024-09-24T21:46:40Z

torchtune/models/flamingo/_component_builders.py

+# ------------------ LoRA Flamingo ------------------
+
+
+class LoRATrainable(Enum):


Where is this used?

ebsmothers · 2024-09-24T21:48:07Z

torchtune/models/flamingo/_component_builders.py

+        "embed_dim": clip_embed_dim,
+        "num_layers": clip_num_layers,
+        "num_heads": num_heads,
+        "activation": nn.GELU,


Maybe it's deliberate but seems like this doesn't match the default in the CLIP builder, could be a potential source of confusion

ebsmothers · 2024-09-24T21:49:45Z

torchtune/models/flamingo/_component_builders.py

+
+def lora_flamingo_decoder(
+    decoder_lora: bool,
+    fusion_lora: bool,


will be used in builders

ebsmothers · 2024-09-24T21:51:32Z

torchtune/models/flamingo/_component_builders.py

+            sa_norm=RMSNorm(dim=embed_dim, eps=1e-5),
+            mlp_norm=RMSNorm(dim=embed_dim, eps=1e-5),


super nit: use the same format for eps (below it's 1e-05)

ebsmothers · 2024-09-24T21:55:52Z

torchtune/models/flamingo/_component_builders.py

+    apply_lora_to_mlp: bool,
+    apply_lora_to_output: bool,


Any particular reason these are now keyword-only args as opposed to how we have them elsewhere?

ebsmothers · 2024-09-24T21:57:46Z

torchtune/models/flamingo/_component_builders.py

+        if idx % fusion_interval == 0:
+            attn = lora_llama3_attention(


Could maybe use partials to reduce the duplicative code here? But nbd either way

ebsmothers · 2024-09-24T21:59:17Z

torchtune/models/flamingo/_component_builders.py

+                ca_norm=RMSNorm(dim=embed_dim),
+                mlp_norm=RMSNorm(dim=embed_dim),


Is it deliberate to have different eps here vs in self-attention layers?

pbontrager added 3 commits September 23, 2024 08:30

initial builder changes

372ef8e

Merge branch 'main' into mm_lora

dcb6dc4

added all builders

7b0f8cd

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024

felipemello1 reviewed Sep 24, 2024

View reviewed changes

torchtune/models/clip/_component_builders.py Show resolved Hide resolved

felipemello1 reviewed Sep 24, 2024

View reviewed changes

pbontrager added 2 commits September 24, 2024 11:07

added single device recipe

cec0202

Merge branch 'main' into mm_lora

aa3621d

joecummings reviewed Sep 24, 2024

View reviewed changes

pbontrager added 2 commits September 24, 2024 13:10

add distributed recipe

c20e879

respond to comments

72b0139

ebsmothers reviewed Sep 24, 2024

View reviewed changes

ebsmothers approved these changes Sep 24, 2024

View reviewed changes

pbontrager merged commit 18efc81 into pytorch:main Sep 24, 2024
17 checks passed

pbontrager deleted the mm_lora branch September 24, 2024 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA Builders for MM #1661

LoRA Builders for MM #1661

pbontrager commented Sep 24, 2024

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

felipemello1 Sep 24, 2024

joecummings Sep 24, 2024

pbontrager Sep 24, 2024

ebsmothers Sep 24, 2024 •

edited

Loading

pbontrager Sep 24, 2024

ebsmothers Sep 24, 2024

ebsmothers Sep 24, 2024

felipemello1 Sep 24, 2024

pbontrager Sep 24, 2024

ebsmothers Sep 24, 2024

felipemello1 Sep 24, 2024

ebsmothers Sep 24, 2024

ebsmothers Sep 24, 2024 •

edited

Loading

pbontrager Sep 24, 2024

ebsmothers Sep 24, 2024

pbontrager Sep 24, 2024

ebsmothers Sep 24, 2024

ebsmothers Sep 24, 2024

ebsmothers Sep 24, 2024

ebsmothers Sep 24, 2024

		# ------------------ LoRA Flamingo ------------------


		class LoRATrainable(Enum):

		sa_norm=RMSNorm(dim=embed_dim, eps=1e-5),
		mlp_norm=RMSNorm(dim=embed_dim, eps=1e-5),

		ca_norm=RMSNorm(dim=embed_dim),
		mlp_norm=RMSNorm(dim=embed_dim),

LoRA Builders for MM #1661

LoRA Builders for MM #1661

Conversation

pbontrager commented Sep 24, 2024

Context

Changelog

Test plan

UX

pytorch-bot bot commented Sep 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1661

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

ebsmothers Sep 24, 2024 •

edited

Loading

ebsmothers Sep 24, 2024 •

edited

Loading