update README (#1652)

pytorch · Mar 11, 2022 · 1ce1fab · 1ce1fab
1 parent 02d0fc6
commit 1ce1fab
Showing 1 changed file with 22 additions and 32 deletions.
diff --git a/README.rst b/README.rst
@@ -13,12 +13,12 @@ torchtext
 This repository consists of:
 
 * `torchtext.datasets <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_: The raw text iterators for common NLP datasets
-* `torchtext.data <https://github.com/pytorch/text/tree/main/torchtext/data>`_: Some basic NLP building blocks (tokenizers, metrics, functionals etc.)
-* `torchtext.nn <https://github.com/pytorch/text/tree/main/torchtext/nn>`_: NLP related modules
+* `torchtext.data <https://github.com/pytorch/text/tree/main/torchtext/data>`_: Some basic NLP building blocks
+* `torchtext.transforms <https://github.com/pytorch/text/tree/main/torchtext/transforms>`_: Basic text-processing transformations
+* `torchtext.models <https://github.com/pytorch/text/tree/main/torchtext/models>`_: Pre-trained models
 * `torchtext.vocab <https://github.com/pytorch/text/tree/main/torchtext/vocab>`_: Vocab and Vectors related classes and factory functions
 * `examples <https://github.com/pytorch/text/tree/main/examples>`_: Example NLP workflows with PyTorch and torchtext library.
 
-Note: The legacy code discussed in `torchtext v0.7.0 release note <https://github.com/pytorch/text/releases/tag/v0.7.0-rc3>`_ has been retired to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_ folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_ folder for more details.
 
 Installation
 ============
@@ -30,6 +30,7 @@ We recommend Anaconda as a Python package management system. Please refer to `py
    :widths: 10, 10, 10
 
    nightly build, main, ">=3.7, <=3.9"
+   1.11.0, 0.12.0, ">=3.6, <=3.9"
    1.10.0, 0.11.0, ">=3.6, <=3.9"
    1.9.1, 0.10.1, ">=3.6, <=3.9"
    1.9, 0.10, ">=3.6, <=3.9"
@@ -103,48 +104,37 @@ The datasets module currently contains:
 * Machine translation: IWSLT2016, IWSLT2017, Multi30k
 * Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking
 * Question answering: SQuAD1, SQuAD2
-* Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
+* Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
+* Model pre-training: CC-100
 
-For example, to access the raw text from the AG_NEWS dataset:
+Models
+======
 
-  .. code-block:: python
+The library currently consist of following pre-trained models:
 
-      >>> from torchtext.datasets import AG_NEWS
-      >>> train_iter = AG_NEWS(split='train')
-      >>> # Iterate with for loop
-      >>> for (label, line) in train_iter:
-      >>>     print(label, line)
-      >>> # Or send to DataLoader
-      >>> from torch.utils.data import DataLoader
-      >>> train_iter = AG_NEWS(split='train')
-      >>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)
+* RoBERTa: `Base and Large Architecture <https://github.com/pytorch/fairseq/tree/main/examples/roberta#pre-trained-models>`_
+* XLM-RoBERTa: `Base and Large Architure <https://github.com/pytorch/fairseq/tree/main/examples/xlmr#pre-trained-models>`_
+
+Tokenizers
+==========
+
+The transforms module currently support following scriptable tokenizers:
+
+* `SentencePiece <https://github.com/google/sentencepiece>`_
+* `GPT-2 BPE <https://github.com/openai/gpt-2/blob/master/src/encoder.py>`_
+* `CLIP <https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py>`_
 
 Tutorials
 =========
 
-To get started with torchtext, users may refer to the following tutorials available on PyTorch website.
+To get started with torchtext, users may refer to the following tutorial available on PyTorch website.
 
+* `SST-2 binary text classification using XLM-R pre-trained model <https://pytorch.org/text/stable/tutorials/sst2_classification_non_distributed.html>`_
 * `Text classification with AG_NEWS dataset <https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html>`_
 * `Translation trained with Multi30k dataset using transformers and torchtext <https://pytorch.org/tutorials/beginner/translation_transformer.html>`_
 * `Language modeling using transforms and torchtext <https://pytorch.org/tutorials/beginner/transformer_tutorial.html>`_
 
 
-[BC Breaking] Legacy
-====================
-
-In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_. This is part of the work to revamp the torchtext library and the motivation has been discussed in `Issue #664 <https://github.com/pytorch/text/issues/664>`_:
-
-* ``torchtext.legacy.data.field``
-* ``torchtext.legacy.data.batch``
-* ``torchtext.legacy.data.example``
-* ``torchtext.legacy.data.iterator``
-* ``torchtext.legacy.data.pipeline``
-* ``torchtext.legacy.datasets``
-
-We have a `migration tutorial <https://colab.research.google.com/github/pytorch/text/blob/release/0.9/examples/legacy_tutorial/migration_tutorial.ipynb>`_ to help users switch to the torchtext datasets in ``v0.9.0`` release. For the users who still want the legacy components, they can add ``legacy`` to the import path.
-
-In the v0.10.0 release, we retire the Vocab class to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_. Users can still access the legacy Vocab via ``torchtext.legacy.vocab``. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.
-
 Disclaimer on Datasets
 ======================