diff --git a/README.rst b/README.rst index a71b452057..7f93b38eb5 100644 --- a/README.rst +++ b/README.rst @@ -13,12 +13,12 @@ torchtext This repository consists of: * `torchtext.datasets `_: The raw text iterators for common NLP datasets -* `torchtext.data `_: Some basic NLP building blocks (tokenizers, metrics, functionals etc.) -* `torchtext.nn `_: NLP related modules +* `torchtext.data `_: Some basic NLP building blocks +* `torchtext.transforms `_: Basic text-processing transformations +* `torchtext.models `_: Pre-trained models * `torchtext.vocab `_: Vocab and Vectors related classes and factory functions * `examples `_: Example NLP workflows with PyTorch and torchtext library. -Note: The legacy code discussed in `torchtext v0.7.0 release note `_ has been retired to `torchtext.legacy `_ folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See `torchtext.legacy `_ folder for more details. Installation ============ @@ -30,6 +30,7 @@ We recommend Anaconda as a Python package management system. Please refer to `py :widths: 10, 10, 10 nightly build, main, ">=3.7, <=3.9" + 1.11.0, 0.12.0, ">=3.6, <=3.9" 1.10.0, 0.11.0, ">=3.6, <=3.9" 1.9.1, 0.10.1, ">=3.6, <=3.9" 1.9, 0.10, ">=3.6, <=3.9" @@ -103,48 +104,37 @@ The datasets module currently contains: * Machine translation: IWSLT2016, IWSLT2017, Multi30k * Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking * Question answering: SQuAD1, SQuAD2 -* Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB +* Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB +* Model pre-training: CC-100 -For example, to access the raw text from the AG_NEWS dataset: +Models +====== - .. code-block:: python +The library currently consist of following pre-trained models: - >>> from torchtext.datasets import AG_NEWS - >>> train_iter = AG_NEWS(split='train') - >>> # Iterate with for loop - >>> for (label, line) in train_iter: - >>> print(label, line) - >>> # Or send to DataLoader - >>> from torch.utils.data import DataLoader - >>> train_iter = AG_NEWS(split='train') - >>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False) +* RoBERTa: `Base and Large Architecture `_ +* XLM-RoBERTa: `Base and Large Architure `_ + +Tokenizers +========== + +The transforms module currently support following scriptable tokenizers: + +* `SentencePiece `_ +* `GPT-2 BPE `_ +* `CLIP `_ Tutorials ========= -To get started with torchtext, users may refer to the following tutorials available on PyTorch website. +To get started with torchtext, users may refer to the following tutorial available on PyTorch website. +* `SST-2 binary text classification using XLM-R pre-trained model `_ * `Text classification with AG_NEWS dataset `_ * `Translation trained with Multi30k dataset using transformers and torchtext `_ * `Language modeling using transforms and torchtext `_ -[BC Breaking] Legacy -==================== - -In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy `_. This is part of the work to revamp the torchtext library and the motivation has been discussed in `Issue #664 `_: - -* ``torchtext.legacy.data.field`` -* ``torchtext.legacy.data.batch`` -* ``torchtext.legacy.data.example`` -* ``torchtext.legacy.data.iterator`` -* ``torchtext.legacy.data.pipeline`` -* ``torchtext.legacy.datasets`` - -We have a `migration tutorial `_ to help users switch to the torchtext datasets in ``v0.9.0`` release. For the users who still want the legacy components, they can add ``legacy`` to the import path. - -In the v0.10.0 release, we retire the Vocab class to `torchtext.legacy `_. Users can still access the legacy Vocab via ``torchtext.legacy.vocab``. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows. - Disclaimer on Datasets ======================