Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT model for Machine Translation #31

Closed
KeremTurgutlu opened this issue Nov 18, 2018 · 13 comments
Closed

BERT model for Machine Translation #31

KeremTurgutlu opened this issue Nov 18, 2018 · 13 comments

Comments

@KeremTurgutlu
Copy link

Is there a way to use any of the provided pre-trained models in the repository for machine translation task?

Thanks

@thomwolf
Copy link
Member

Hi Kerem, I don't think so. Have a look at the fairsep repo maybe.

@JasonVann
Copy link

@thomwolf hi there, I couldn't find out anything about the fairsep repo. Could you post a link? Thanks!

@thomwolf
Copy link
Member

thomwolf commented Nov 26, 2018

Hi, I am talking about this repo: https://github.com/pytorch/fairseq.
Have a look at their Transformer's models for machine translation.

@alphadl
Copy link

alphadl commented Feb 20, 2019

I have conducted several MT experiments which fixed the embeddings by using BERT, UNFORTUNATELY, I find it makes performance worse. @JasonVann @thomwolf

@SinghJasdeep
Copy link
Contributor

Hey!

FAIR has demonstrated that using BERT for unsupervised translation greatly improves BLEU.

Paper: https://arxiv.org/abs/1901.07291

Repo: https://github.com/facebookresearch/XLM

Older papers showing pre-training with LM (not MLM) helps Seq2Seq: https://arxiv.org/abs/1611.02683

Hope this helps!

@gtesei
Copy link

gtesei commented Mar 1, 2019

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

@echan00
Copy link
Contributor

echan00 commented Apr 13, 2019

Does anyone know if BERT improves things also for supervised translation?

Also interested

@nyck33
Copy link

nyck33 commented May 5, 2019

Because BERT is an encoder, I guess we need a decoder. I looked here: https://jalammar.github.io/
and it seems Openai Transformer is a decoder. But I cannot find a repo for it.
https://www.tensorflow.org/alpha/tutorials/text/transformer
I think Bert outputs a vector of size 768. Can we just do a reshape and use the decoder in that transformer notebook? In general can I just reshape and try out a bunch of decoders?

@tacchinotacchi
Copy link

tacchinotacchi commented Jun 3, 2019

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

https://arxiv.org/pdf/1901.07291.pdf seems to suggest that it does improve the results for supervised translation as well. However this paper is not about using BERT embeddings, rather about pre-training the encoder and decoder on an Masked Language Modelling objective. The biggest benefit comes from initializing the encoder with the weights from BERT, and surprisingly using it to initialize the decoder also brings small benefits, even though if I understand correctly you still have to randomly initialize the weights for the encoder attention module, since it's not present in the pre-trained network.

EDIT: of course the pre-trained network needs to have been trained on multi-lingual data, as stated in the paper

@torshie
Copy link

torshie commented Jul 5, 2019

I have managed to replace transformer's encoder with a pretrained bert encoder, however experiment results were very poor. It dropped BLEU score by about 4

The source code is available here: https://github.com/torshie/bert-nmt , implemented as a fairseq user model. It may not work out of box, some minor tweeks may be needed.

@sailordiary
Copy link

Could be relevant:

Towards Making the Most of BERT in Neural Machine Translation
On the use of BERT for Neural Machine Translation

@Bachstelze
Copy link

Also have a look at MASS and XLM.

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
update coqa ensemble runner & data processing pipeline
@lileicc
Copy link

lileicc commented Nov 9, 2021

Yes. It is possible to use both BERT as encoder and GPT as decoder and glue them together.
There is a recent paper on this: Multilingual Translation via Grafting Pre-trained Language Models
https://aclanthology.org/2021.findings-emnlp.233.pdf
https://github.com/sunzewei2715/Graformer

SaulLu pushed a commit to SaulLu/transformers that referenced this issue Jan 11, 2022
…est_tokenize

add test `tokenize` method equality between slow and fast versions
ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023
Add support for timestamped speech-to-text (w/ whisper)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests