Description
Is your feature request related to a problem? Please describe.
Currently you can add textual inversion with .load_textual_inversion()
. But once the tokenizer and text encoder have the new embeddings, it's complex to remove the embeddings and get it back to the original state.
Describe the solution you'd like.
An unload_textual_inversion()
to remove alien/foreign tokens and get the text encoder back to the original state (maybe there could also be a way to pass specific tokens to be removed).
Describe alternatives you've considered.
As @patrickvonplaten described internally, it is possible to remove tokens currently by doing something like:
token_id = pipe.tokenizer.convert_token_to_id("<token>")
del pipe.tokenizer._added_tokens_decoder[token_id]
pipe.tokenizer._update_trie()
and
text_embeddings = pipe.text_encoder.get_input_embeddings().weight
text_embeddings = text_embedding[:len(pipe.tokenizer)]
pipe.text_encoder.set_input_embeddings(text_embeddings)
and it would probably be easier with a specific remove_token
method from Transformers (huggingface/transformers#15032, huggingface/transformers#4827).
However, imo it does not eliminate the usefulness of having a diffusers specific method that leverages the ways of removing tokens in order to provide a simple API for folks to remove tokens.
Additional context.
Hot-swapping (keeping a base model warm and swapping the LoRAs on top) requires fusing/unfusing LoRAs, but with pivotal tuning this becomes more complex to reset the text encoder, tokenizer