diff --git a/README.md b/README.md index 047e1b7686..af5fe8b01e 100644 --- a/README.md +++ b/README.md @@ -1,241 +1,163 @@
+# Self-Supervised Neural Machine Translation -------------------------------------------------------------------------------- - -Fairseq(-py) is a sequence modeling toolkit that allows researchers and -developers to train custom models for translation, summarization, language -modeling and other text generation tasks. - -We provide reference implementations of various sequence modeling papers: - -- -* **Convolutional Neural Networks (CNN)** - + [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md) - + [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md) - + [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel) - + [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md) - + [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md) -* **LightConv and DynamicConv models** - + [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md) -* **Long Short-Term Memory (LSTM) networks** - + Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015) -* **Transformer (self-attention) networks** - + Attention Is All You Need (Vaswani et al., 2017) - + [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md) - + [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md) - + [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/README.adaptive_inputs.md) - + [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples/constrained_decoding/README.md) - + [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples/truncated_bptt/README.md) - + [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples/adaptive_span/README.md) - + [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md) - + [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md) - + [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md) - + [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md ) - + [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md) - + [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md) - + [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md) - + [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md) - + [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md) - + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md) - + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md) - + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md) - + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979) - + [Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)](https://arxiv.org/abs/2010.11430) - + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027) - + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https://arxiv.org/abs/2105.11084) - + [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition (Xu et al., 2021)](https://arxiv.org/abs/2109.11680) - + [VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Xu et. al., 2021)](https://arxiv.org/pdf/2109.14084.pdf) - + [VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (Xu et. al., 2021)](https://aclanthology.org/2021.findings-acl.370.pdf) - + [NormFormer: Improved Transformer Pretraining with Extra Normalization (Shleifer et. al, 2021)](examples/normformer/README.md) -* **Non-autoregressive Transformers** - + Non-Autoregressive Neural Machine Translation (Gu et al., 2017) - + Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018) - + Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019) - + Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019) - + [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md) -* **Finetuning** - + [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md) - -
- -* September 2020: [Added Linformer code](examples/linformer/README.md) -* September 2020: [Added pointer-generator networks](examples/pointer_generator/README.md) -* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md) -* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md) -* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md) -* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq) -* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md) -* April 2020: [Quant-Noise code released](examples/quant_noise/README.md) -* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md) -* March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md) -* February 2020: [mBART model and code released](examples/mbart/README.md) -* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/main/examples/backtranslation#training-your-own-model-wmt18-english-german) -* December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0) -* November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example) -* November 2019: [CamemBERT model and code released](examples/camembert/README.md) -* November 2019: [BART model and code released](examples/bart/README.md) -* November 2019: [XLM-R models and code released](examples/xlmr/README.md) -* September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md) -* August 2019: [WMT'19 models released](examples/wmt19/README.md) -* July 2019: fairseq relicensed under MIT license -* July 2019: [RoBERTa models and code released](examples/roberta/README.md) -* June 2019: [wav2vec models and code released](examples/wav2vec/README.md) - -