-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train a custom seq2seq model with BertModel #4517
Comments
Hi @chenjunweii - thanks for your issue! I will take a deeper look at the EncoderDecoder framework at the end of this week and should add a google colab on how to fine-tune it. |
Using Bert - Bert model for seq2seq task should work using simpletransformers library, there is an working code. |
Hi @flozi00, |
Of course, it should be reproduceable using this code: import logging
import pandas as pd
from simpletransformers.seq2seq import Seq2SeqModel
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
train_data = [
["one", "1"],
["two", "2"],
]
train_df = pd.DataFrame(train_data, columns=["input_text", "target_text"])
eval_data = [
["three", "3"],
["four", "4"],
]
eval_df = pd.DataFrame(eval_data, columns=["input_text", "target_text"])
model_args = {
"reprocess_input_data": True,
"overwrite_output_dir": True,
"max_seq_length": 10,
"train_batch_size": 2,
"num_train_epochs": 10,
"save_eval_checkpoints": False,
"save_model_every_epoch": False,
"evaluate_generated_text": True,
"evaluate_during_training_verbose": True,
"use_multiprocessing": False,
"max_length": 15,
"manual_seed": 4,
}
encoder_type = "roberta"
model = Seq2SeqModel(
encoder_type,
"roberta-base",
"bert-base-cased",
args=model_args,
use_cuda=True,
)
model.train_model(train_df)
results = model.eval_model(eval_df)
print(model.predict(["five"]))
model1 = Seq2SeqModel(
encoder_type,
encoder_decoder_name="outputs",
args=model_args,
use_cuda=True,
)
print(model1.predict(["five"]) It the sample code in documentation of simpletransformers library. |
Hey @flozi00, I think #4680 fixes the error. @chenjunweii - a Bert2Bert model using the You can load the model via: from transformers import EncoderDecoder
model = EncoderDecoder.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert and train it on conditional language text generation providing the |
Thank you for working on this problem and thank you for 🤗 ! But I still have some questions and concerns about the
Documentation says that "Causal mask will also be used by default", but I did not find how to change it. E.g. what if I am training model without teacher forcing (just generating words one by one during training) or if I am doing inference? I would suggest to add one more argument to the forward that would make it both more clear when causal masking is used and how to enable/disable it. What do you think?
It just feels weird to use BERT as a decoder. BERT is a mode that is a) non-autoregressive b) pre-trained without cross-attention modules. It is also unclear at which point the cross-attention modules are created. It would be great, if it is possible, to add something like |
Hey @Guitaricet :-) , First, at the moment only Bert2Bert works with the encoder-decoder framework. Also, if you use Bert as a decoder you will always use a causal mask. At the moment I cannot think of an encoder-decoder in which the decoder does not use a causal mask, so I don't see a reason why one would want to disable it. Can you give me an example where the decoder should not have a causal mask?
|
I'm trying to build a Bert2Bert model using EncoderDecoder, but I have a couple quick questions regarding the format of inputs and targets for the BERT decoder. What exactly is a good way to format the conditional mask to the decoder. For example, if I want to feed the decoder [I, am] and make it output [I, am, happy], how exactly do I mask the input? Do I give the decoder [CLS, I, am, MASK, ...., MASK, SEP] where the number of MASKs is such that the total number of tokens is a fixed length (like 512)? Or do I just input [CLS, I, am, MASK, SEP, PAD, ..., PAD]? Similarly, what should the decoder's output be? Does the first token (the "output" of CLS) be the token "I"? Lastly, is there a website or resource that explains the input and output representations of text given to the decoder in Bert2Bert? I don't think the authors of the paper have released their code yet. Thanks! |
I will soon release a bert2bert notebook that will show how to do this. You can also take a look at this: Maybe it helps. |
Thank you @patrickvonplaten for clarification
It is very possible that both of these cases are rare, so the library may not need
I also noticed that Again, thank you for you work, 🤗 is what NLP community needed for quite some time! UPD: more reasons to use a different attention mask (not for seq2seq though) XLNet-like or ULM-like pre-training |
Hi @patrickvonplaten , Thanks for the clarification on this topic and for the great work you've been doing on those seq2seq models. Thanks. |
Yeah, the code is ready in this PR: https://github.com/huggingface/transformers/tree/more_general_trainer_metric . And in order for the script to work, you need to use this Trainer class: I'm currently training the model myself. When the results are decent, I will publish a little notebook. |
Hi @patrickvonplaten, thanks for sharing the scripts. However, the second link for training an encoder-decoder model is not found. Could you please upload this script? Thanks. |
You |
Sorry, I deleted the second link. You can see all the necessary code on this model page: |
Thanks for sharing this, Patrick. |
I am trying to implement a encoder decoder with BART but I have no idea how to do so, and I need to fine tune the decoder model, so eventually I need to train my decoder model. I am trying to use the
And here's how I am using it.
I am calculating perplexity from the loss, and I am getting a perplexity score of 1000+, which is bad. I would like to know whats my model is lacking and is it possible that I could use |
@AmbiTyga from what I know, BART is already a encoder-decoder model, with a BERT as a encoder and a GPT as a decoder. So you are encoding-decoding in encoder and encoding-decoding in decoder, which I don t think is a good idea. For the moment EncoderDecoderModel supports only BERT. |
@iliemihai So can you refer me how to use BART in such cases like I have coded above? |
@patrickvonplaten is Bert the only model that is supported as a decoder? I was hoping to train a universal model so wanted to use xlm-roberta (xlmr) as both encoder and decoder; Is this possible given the current EncoderDecoder framework? I know bert has a multilingual checkpoint but performance-wise an xlm-roberta model should be better. I noticed the notebook https://github.com/huggingface/transformers/blob/16e38940bd7d2345afc82df11706ee9b16aa9d28/model_cards/patrickvonplaten/roberta2roberta-share-cnn_dailymail-fp16/README.md does roberta2roberta; is this same code applicable to xlm-roberta? |
Hey @spookypineapple - good question! Here is the PR that adds XLM-Roberta to the EncoderDecoder models: #6878 will not make it to 3.1.0 but should be available on master in ~1,2 days |
Im pulling from master so I should get at least the neccessary code artifacts to get bert2bert to work. However Im seeing (for a bert2bert setup using bert-base-multilingual-cased) that the output of the decoder remains unchanged regardless of the input to the encoder; this behavior seems to persist with training... The code im using to initialize the EncoderDecoder model is as follows:
|
Hey @spookypineapple, A couple of things regarding your code:
=> You might want to check these model cards of bert2bert which explain how to fine-tune such an encoder decoder model: https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16 Hope this helps! |
It does help indeed! Thankyou @patrickvonplaten |
@patrickvonplaten can you please share a tutorial/notebook on training the encoder-decoder model for machine translation? |
@patrickvonplaten can you create a notebook on how to use custom dataset to fine tune bert2bert models ? |
I would like to disable causal masking to use it in DETR, which uses parallel decoding... But this not seem possible at the moment. In my opinion, an option to disable causal masking in the decoder would be useful |
@patrickvonplaten , none of the links is working. Is it possible to fix them? |
For BERT2BERT you can just use the This example shows how to instantiate a Bert2Bert model which you can then train on any seq2seq task you want, e.g. summarization: https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization (you just need to slighly adapt the example, or pre-create a BERT2BERT and use it as a checkpoint) |
Thanks! |
How to train a custom seq2seq model with
BertModel
,I would like to use some Chinese pretrained model base on
BertModel
so I've tried using
Encoder-Decoder Model
, but it seems theEncoder-Decoder Model
is not used for conditional text generationand I saw that BartModel seems to be the model I need, but I cannot load pretrained BertModel weight with BartModel.
by the way, could I finetune a BartModel for seq2seq with custom data ?
any suggestion, thanks
The text was updated successfully, but these errors were encountered: