Beginning of generation utils and necessary refactors of T5 Model #2011

joecummings · 2022-12-19T15:21:12Z

Context

We aim to add generation utils that support a number of encoder/decoder and decoder-based models. To do so, we also have to rework our current encoder/decoder model, T5.

Changes

Separated logic for encoder and decoder into self-contained nn.Modules.
1a. Move dropout layers and norms to T5Encoder and T5Decoder
1b. Pass token_embeddings to the encoder if constructed through the T5Model. Now the encoder can take in tokenized text or embedded text.
1c. Add get_encoder and get_decoder getter functions (not torchscriptable ATM)
1d. Update type annotations to allow for padding_masks and encoder_outputs
1e. Change T5Encoder and T5Decoder return types to dictionaries
Added GenerationUtils class and greedy_search generation technique
2a. Added deprecation warning to T5Wrapper until beam_search is added.
Froze configs to avoid mutating the model unnecessarily

Testing

One notebook showing parity with HuggingFace's T5 model
One notebook showing that HuggingFace models can be used with the GenerationUtil
Integration tests for the new endpoints and additional functions

Notes

T5Wrapper is no longer Torchscriptable
Was not able to guarantee parity between our greedy search and HuggingFace's despite having nearly identical implementations
As part of a discussion with @forresti wrt onboarding T5 V1.1 and T5-FLAN, might need to refactor to include Gated-GeLU activation. This will come in a follow-up PR but documenting here.

joecummings · 2022-12-19T15:22:25Z

notebooks/hf_vs_tt_t5.ipynb

@@ -0,0 +1,83 @@
+{


I know we verify completeness often w/ internal notebooks - I thought for those that show parity with HuggingFace or external libraries, we could put those notebooks in the actual repo. Seems like a better way to keep track rather than some Bento notebooks w/ scattered ownership.

Can you upload the notebook to Github gist and provide a link in the PR so it's easier to review the contents?

Sure, but for a quick fix you can right click on expand dots on the top right of this file and select "View file" and it'll give you a notebook view.

joecummings · 2022-12-19T15:23:08Z

torchtext/prototype/models/t5/model.py



-@dataclass
+@dataclass(frozen=True)


Freezing this as we probably don't want people to be able to overwrite configs and still try to use the model - much more likely to run into bugs that way.

What if someone wants to experiment with a smaller model or modified architecture? Are there distilled or smaller T5 models out there? We don't freeze other configs so I am not sure I agree with this

Freezing the config won't make it impossible to try a smaller model or modified architecture. It just means that once they instantiate the config and pass the config to the model, they won't be able to modify it.

Example:

config = T5Config(encoder_only=True) t5_model = T5Model(config=config) t5_model.config.encoder_only = False # Currently allowed; with freezing config, this would throw an error

Just to follow up here, in the example you just showed, would it affect the model behavior if users did end up changing the config after instantiating the model? IIUC the config is only used during model instantiation anyways. That being said I don't see any issues with freezing the config.

It wouldn't affect the model behavior, but it would throw an error saying "Config cannot be modified", which I think is what we want. It would be considered undefined behavior if someone e.g. instantiated a model without a decoder and then went back and changed the config to say that it did have a decoder.

torchtext/prototype/models/t5/model.py

joecummings · 2022-12-19T15:24:39Z

torchtext/prototype/models/t5/model.py

    def forward(
        self,
-        encoder_tokens: Tensor,
+        encoder_tokens: Optional[Tensor] = None,


Don't have to include encoder_tokens if encoder_outputs are already provided.

torchtext/prototype/models/t5/modules.py

joecummings · 2022-12-19T15:25:56Z

torchtext/prototype/models/t5/modules.py


    def forward(
        self,
-        tgt: Tensor,
+        tgt: Optional[Tensor] = None,


Similar to the forward of the entire model, if the target is already embedded, no need to inlcude the raw tokenized tgt.

joecummings · 2022-12-19T15:32:24Z

torchtext/prototype/generate.py

+
+        if self.is_encoder_decoder:
+            encoder = self.model.get_encoder()
+            model_kwargs["encoder_outputs"] = encoder(inputs)


This should be the necessary args for the forward method of whatever model is being used in decoding.

joecummings · 2022-12-19T16:21:41Z

torchtext/prototype/generate.py

+from torch import nn
+
+
+class GenerationUtil:


Does this whole class have to be torchscriptable, as well??

That would make it extremely difficult to incorporate other models.

Does this whole class have to be torchscriptable, as well??

If we expect that this util will be used in Predictor during inference time then yes it does. Can you explain what makes it difficult to make this torchscriptable.

As a first step, we can always implement this without torchscriptability support for customers to experiment with. And if there's enough demand to make it torchscriptable then we can come back and add this support.

rshraga · 2022-12-19T16:56:41Z

torchtext/prototype/models/t5/model.py



-@dataclass
+@dataclass(frozen=True)


What if someone wants to experiment with a smaller model or modified architecture? Are there distilled or smaller T5 models out there? We don't freeze other configs so I am not sure I agree with this

torchtext/prototype/models/t5/model.py

rshraga · 2022-12-19T17:05:30Z

torchtext/prototype/generate.py

+        self, batch_size: int, device: Optional[torch.device] = None, **model_kwargs
+    ):
+        if model_kwargs is not None and "decoder_input_ids" in model_kwargs:
+            return model_kwargs.pop("decoder_input_ids")


why pass around model_kwargs dict instead of just having an optional decoder_input_ids param?

Nayef211 · 2022-12-19T17:21:45Z

notebooks/hf_vs_tt_t5.ipynb

@@ -0,0 +1,83 @@
+{


Can you upload the notebook to Github gist and provide a link in the PR so it's easier to review the contents?

torchtext/prototype/models/t5/modules.py

Nayef211 · 2022-12-19T17:29:26Z

torchtext/prototype/models/t5/modules.py

        device: Optional[torch.device] = None,
        dtype=None,
    ) -> None:
        super().__init__()

+        self.token_embeddings = token_embeddings


Can you add a description of this input argument to the docstring above?

Nayef211 · 2022-12-19T17:36:28Z

torchtext/prototype/generate.py

+from torch import nn
+
+
+class GenerationUtil:


Does this whole class have to be torchscriptable, as well??

If we expect that this util will be used in Predictor during inference time then yes it does. Can you explain what makes it difficult to make this torchscriptable.

As a first step, we can always implement this without torchscriptability support for customers to experiment with. And if there's enough demand to make it torchscriptable then we can come back and add this support.

Nayef211 · 2022-12-19T17:36:46Z

torchtext/prototype/generate.py

+        self, batch_size: int, device: Optional[torch.device] = None, **model_kwargs
+    ):
+        if model_kwargs is not None and "decoder_input_ids" in model_kwargs:
+            return model_kwargs.pop("decoder_input_ids")


Nayef211 · 2022-12-19T17:47:36Z

torchtext/prototype/generate.py

+                unfinished_sequences = unfinished_sequences.mul((next_tokens != eos_idx).long())
+


Can we also add an explanation for this line? Having a hard time following the logic. Alternatively let's add a couple of lines to the docstring of this method explaining the approach.

Nayef211 · 2022-12-19T18:02:45Z

torchtext/prototype/models/t5/model.py



-@dataclass
+@dataclass(frozen=True)


Just to follow up here, in the example you just showed, would it affect the model behavior if users did end up changing the config after instantiating the model? IIUC the config is only used during model instantiation anyways. That being said I don't see any issues with freezing the config.

torchtext/prototype/models/t5/model.py

Nayef211 · 2022-12-19T18:18:09Z

torchtext/prototype/models/t5/modules.py

+        self.norm = T5LayerNorm(d_model)
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(dropout)


Why did we decide to move this from the model to the encoder/decoder?

Keeps the entire encoder forward method self-contained.

…ration

joecummings · 2022-12-27T22:42:14Z

@Nayef211 Can I get some 👀 on this again when you have a chance?

joecummings · 2022-12-28T17:53:14Z

@atalman @osalpekar Is this failing integration test related to Nova migration? The process seems to be killed with no helpful error and the integration tests pass on my local machine.

atalman · 2022-12-28T20:41:54Z

@atalman @osalpekar Is this failing integration test related to Nova migration? The process seems to be killed with no helpful error and the integration tests pass on my local machine.

@joecummings looks like integration tests are running out of memory, code 137: 3796110184

joecummings · 2022-12-28T21:58:40Z

@atalman @osalpekar Is this failing integration test related to Nova migration? The process seems to be killed with no helpful error and the integration tests pass on my local machine.

@joecummings looks like integration tests are running out of memory, code 137: 3796110184

Silly follow-up question, but how would I go about allocating more memory for these integration tests?

pbontrager

Great to have sampling as part of the library!

pbontrager · 2022-12-29T20:53:28Z

torchtext/prototype/generate.py

+
+        return input_ids
+
+    def beam_search(self, input_ids: torch.Tensor, num_beams: int, max_len: Optional[int]) -> torch.Tensor:


If we put all the sampling methods in this class (beam_search, greedy), is that very extensible for the user? Or should these be separate classes that inherit from GenerationUtil or a general Sampler class?

Idk about inheriting from GenerationUtil, but as a standalone class, this makes sense.

pbontrager · 2022-12-29T20:57:43Z

torchtext/prototype/models/t5/bundler.py

@@ -176,7 +176,8 @@ def build_model_from_huggingface_ckpt(

        t5_model_state_dict = {
            "token_embeddings.weight": hf_weights["shared.weight"],
-            "norm1.weight": hf_weights["encoder.final_layer_norm.weight"],
+            "encoder.token_embeddings.weight": hf_weights["shared.weight"],


Is there anyway to bundle Generation parameters with a model so the user doesn't have to know the correct sampling defaults for a given model?

pbontrager · 2022-12-29T21:07:51Z

torchtext/prototype/models/t5/wrapper.py

@@ -47,6 +48,8 @@ def __init__(
            strict (bool): Passed to :func: `torch.nn.Module.load_state_dict` method. (Default: `False`)
            dl_kwargs (dictionary of keyword arguments): Passed to :func:`torch.hub.load_state_dict_from_url`. (Default: `None`)
        """
+        warnings.warn("`T5Wrapper` is being deprecated. Please use new `GenerationUtils`.", category=DeprecationWarning)


GenerationUtils, if made in an nn.Module, could be treated as a generic wrapper for any LLM. This might be easier for the user but would break from the Huggingface design. It would allow for generation parameters to be saved with the model.

facebook-github-bot added the cla signed label Dec 19, 2022

joecummings commented Dec 19, 2022

View reviewed changes

torchtext/prototype/models/t5/model.py Outdated Show resolved Hide resolved

joecummings commented Dec 19, 2022

View reviewed changes

torchtext/prototype/models/t5/modules.py Show resolved Hide resolved

joecummings commented Dec 19, 2022

View reviewed changes

joecummings requested review from Nayef211, abhinavarora, rshraga and pbontrager December 19, 2022 15:30

joecummings commented Dec 19, 2022

View reviewed changes

rshraga reviewed Dec 19, 2022

View reviewed changes

Nayef211 reviewed Dec 19, 2022

View reviewed changes

joecummings force-pushed the greedy-generation branch from b9c54a0 to 4172133 Compare December 19, 2022 21:06

joecummings added 2 commits December 20, 2022 15:33

Separate encoding/decoding logic for T5 model in preparation for gene…

c008115

…ration

Add generation utils with greedy search and tests

b699de2

joecummings force-pushed the greedy-generation branch from 4172133 to b699de2 Compare December 20, 2022 20:35

joecummings marked this pull request as ready for review December 21, 2022 01:13

joecummings force-pushed the greedy-generation branch from 183b80c to b699de2 Compare December 27, 2022 15:08

joecummings requested review from rshraga and Nayef211 December 27, 2022 22:09

Fix linting issues

6860a30

joecummings force-pushed the greedy-generation branch from f9d8274 to 6860a30 Compare December 27, 2022 22:35

joecummings added 2 commits December 28, 2022 11:14

Update docstring

63a3020

Update generate test to fix warning test failure

d10043d

Add default max seq len to logging and test

25f18ef

joecummings force-pushed the greedy-generation branch from 13e9ce4 to 25f18ef Compare December 28, 2022 22:13

pbontrager reviewed Dec 29, 2022

View reviewed changes

joecummings merged commit a933cbe into pytorch:main Dec 29, 2022

joecummings deleted the greedy-generation branch December 29, 2022 21:25

joecummings mentioned this pull request Jan 3, 2023

Increase memory for integration tests #2018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beginning of generation utils and necessary refactors of T5 Model #2011

Beginning of generation utils and necessary refactors of T5 Model #2011

joecummings commented Dec 19, 2022 •

edited

Loading

joecummings Dec 19, 2022

Nayef211 Dec 19, 2022

joecummings Dec 19, 2022

joecummings Dec 19, 2022

rshraga Dec 19, 2022

joecummings Dec 19, 2022

Nayef211 Dec 19, 2022

joecummings Dec 28, 2022

joecummings Dec 19, 2022

joecummings Dec 19, 2022

joecummings Dec 19, 2022

joecummings Dec 19, 2022

joecummings Dec 19, 2022

Nayef211 Dec 19, 2022

rshraga Dec 19, 2022

rshraga Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

Nayef211 Dec 19, 2022

joecummings Dec 28, 2022

joecummings commented Dec 27, 2022

joecummings commented Dec 28, 2022

atalman commented Dec 28, 2022

joecummings commented Dec 28, 2022

pbontrager left a comment

pbontrager Dec 29, 2022

joecummings Dec 29, 2022

pbontrager Dec 29, 2022

pbontrager Dec 29, 2022

		unfinished_sequences = unfinished_sequences.mul((next_tokens != eos_idx).long())


		return input_ids

		def beam_search(self, input_ids: torch.Tensor, num_beams: int, max_len: Optional[int]) -> torch.Tensor:



		@dataclass
		@dataclass(frozen=True)



		@dataclass
		@dataclass(frozen=True)



		@dataclass
		@dataclass(frozen=True)

Beginning of generation utils and necessary refactors of T5 Model #2011

Beginning of generation utils and necessary refactors of T5 Model #2011

Conversation

joecummings commented Dec 19, 2022 • edited Loading

Context

Changes

Testing

Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joecummings commented Dec 27, 2022

joecummings commented Dec 28, 2022

atalman commented Dec 28, 2022

joecummings commented Dec 28, 2022

pbontrager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joecummings commented Dec 19, 2022 •

edited

Loading