Skip to content

Commit

Permalink
update README (#1652)
Browse files Browse the repository at this point in the history
  • Loading branch information
parmeet authored Mar 11, 2022
1 parent 02d0fc6 commit 1ce1fab
Showing 1 changed file with 22 additions and 32 deletions.
54 changes: 22 additions & 32 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ torchtext
This repository consists of:

* `torchtext.datasets <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_: The raw text iterators for common NLP datasets
* `torchtext.data <https://github.com/pytorch/text/tree/main/torchtext/data>`_: Some basic NLP building blocks (tokenizers, metrics, functionals etc.)
* `torchtext.nn <https://github.com/pytorch/text/tree/main/torchtext/nn>`_: NLP related modules
* `torchtext.data <https://github.com/pytorch/text/tree/main/torchtext/data>`_: Some basic NLP building blocks
* `torchtext.transforms <https://github.com/pytorch/text/tree/main/torchtext/transforms>`_: Basic text-processing transformations
* `torchtext.models <https://github.com/pytorch/text/tree/main/torchtext/models>`_: Pre-trained models
* `torchtext.vocab <https://github.com/pytorch/text/tree/main/torchtext/vocab>`_: Vocab and Vectors related classes and factory functions
* `examples <https://github.com/pytorch/text/tree/main/examples>`_: Example NLP workflows with PyTorch and torchtext library.

Note: The legacy code discussed in `torchtext v0.7.0 release note <https://github.com/pytorch/text/releases/tag/v0.7.0-rc3>`_ has been retired to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_ folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_ folder for more details.

Installation
============
Expand All @@ -30,6 +30,7 @@ We recommend Anaconda as a Python package management system. Please refer to `py
:widths: 10, 10, 10

nightly build, main, ">=3.7, <=3.9"
1.11.0, 0.12.0, ">=3.6, <=3.9"
1.10.0, 0.11.0, ">=3.6, <=3.9"
1.9.1, 0.10.1, ">=3.6, <=3.9"
1.9, 0.10, ">=3.6, <=3.9"
Expand Down Expand Up @@ -103,48 +104,37 @@ The datasets module currently contains:
* Machine translation: IWSLT2016, IWSLT2017, Multi30k
* Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking
* Question answering: SQuAD1, SQuAD2
* Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
* Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
* Model pre-training: CC-100

For example, to access the raw text from the AG_NEWS dataset:
Models
======

.. code-block:: python
The library currently consist of following pre-trained models:

>>> from torchtext.datasets import AG_NEWS
>>> train_iter = AG_NEWS(split='train')
>>> # Iterate with for loop
>>> for (label, line) in train_iter:
>>> print(label, line)
>>> # Or send to DataLoader
>>> from torch.utils.data import DataLoader
>>> train_iter = AG_NEWS(split='train')
>>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)
* RoBERTa: `Base and Large Architecture <https://github.com/pytorch/fairseq/tree/main/examples/roberta#pre-trained-models>`_
* XLM-RoBERTa: `Base and Large Architure <https://github.com/pytorch/fairseq/tree/main/examples/xlmr#pre-trained-models>`_

Tokenizers
==========

The transforms module currently support following scriptable tokenizers:

* `SentencePiece <https://github.com/google/sentencepiece>`_
* `GPT-2 BPE <https://github.com/openai/gpt-2/blob/master/src/encoder.py>`_
* `CLIP <https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py>`_

Tutorials
=========

To get started with torchtext, users may refer to the following tutorials available on PyTorch website.
To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

* `SST-2 binary text classification using XLM-R pre-trained model <https://pytorch.org/text/stable/tutorials/sst2_classification_non_distributed.html>`_
* `Text classification with AG_NEWS dataset <https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html>`_
* `Translation trained with Multi30k dataset using transformers and torchtext <https://pytorch.org/tutorials/beginner/translation_transformer.html>`_
* `Language modeling using transforms and torchtext <https://pytorch.org/tutorials/beginner/transformer_tutorial.html>`_


[BC Breaking] Legacy
====================

In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_. This is part of the work to revamp the torchtext library and the motivation has been discussed in `Issue #664 <https://github.com/pytorch/text/issues/664>`_:

* ``torchtext.legacy.data.field``
* ``torchtext.legacy.data.batch``
* ``torchtext.legacy.data.example``
* ``torchtext.legacy.data.iterator``
* ``torchtext.legacy.data.pipeline``
* ``torchtext.legacy.datasets``

We have a `migration tutorial <https://colab.research.google.com/github/pytorch/text/blob/release/0.9/examples/legacy_tutorial/migration_tutorial.ipynb>`_ to help users switch to the torchtext datasets in ``v0.9.0`` release. For the users who still want the legacy components, they can add ``legacy`` to the import path.

In the v0.10.0 release, we retire the Vocab class to `torchtext.legacy <https://github.com/pytorch/text/tree/release/0.9/torchtext/legacy>`_. Users can still access the legacy Vocab via ``torchtext.legacy.vocab``. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.

Disclaimer on Datasets
======================

Expand Down

0 comments on commit 1ce1fab

Please sign in to comment.