Skip to content

Resizing HF token embeddings with PipelineModule #1010

Open
@g-karthik

Description

@g-karthik

With a HF model class, one can resize token embeddings to account for any special tokens, there's no upper limit, i.e.,
in the usual scenario (this isn't necessarily working code, I may have gotten the tokenizer APIs incorrect):

from transformers import GPT2Config, GPT2LMHeadModel, GPT2Tokenizer
config_class = GPT2Config
model_class = GPT2LMHeadModel
tokenizer_class = GPT2Tokenizer

config = config_class.from_pretrained("gpt2-xl")   # let's say we want to use the XL config for now, has its own vocab size
tokenizer = tokenizer_class.from_pretrained("gpt2-xl")  # default XL vocab

tokenizer.add_special_tokens("<speaker1>")
tokenizer.add_special_tokens("<speaker2>")

model = model_class(config)
model.resize_token_embeddings(len(tokenizer))

The last line essentially allocates 2 new indices for the newly added special tokens in the input embeddings matrix, and initializes their embeddings with random weights.

Now in the pipeline regime, one cannot just resize the token embeddings after initialization of the PipelineModule, since the module would have already split the model across pipeline stages. Is it possible to provide a callback/mechanism with PipelineModule that can allow for resizing and fresh initialization of newly added special token embeddings for downstream users?

Also, shouldn't this be a problem with the implementation of pipeline (and more generally 3D) parallelism in the DeepSpeedExamples repo too? A user of a model that's been pre-trained with pipeline parallelism would certainly have some basic downstream needs such as addition of special tokens for fine-tuning.

@ShadenSmith @stas00

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions