Skip to content

Commit

Permalink
Import torchtext #1410 0930843
Browse files Browse the repository at this point in the history
Summary: Import latest from github

Reviewed By: Nayef211

Differential Revision: D31745899

fbshipit-source-id: e4ac5c337bcbd1a8809544add7679dd3da242999
  • Loading branch information
parmeet authored and facebook-github-bot committed Oct 19, 2021
1 parent ae13bc6 commit 579c519
Show file tree
Hide file tree
Showing 27 changed files with 202 additions and 1,329 deletions.
6 changes: 3 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ smoke_test_common: &smoke_test_common
jobs:
circleci_consistency:
docker:
- image: circleci/python:3.8
- image: cimg/python:3.8
steps:
- checkout
- run:
Expand Down Expand Up @@ -234,7 +234,7 @@ jobs:
# Requires org-member context
binary_wheel_upload:
docker:
- image: circleci/python:3.8
- image: cimg/python:3.8
steps:
- attach_workspace:
at: ~/workspace
Expand Down Expand Up @@ -497,7 +497,7 @@ jobs:
- v1-windows-dataset-vector-{{ checksum ".cachekey" }}
- v1-windows-dataset-{{ checksum ".cachekey" }}


- run:
name: Run tests
# Downloading embedding vector takes long time.
Expand Down
6 changes: 3 additions & 3 deletions .circleci/config.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ smoke_test_common: &smoke_test_common
jobs:
circleci_consistency:
docker:
- image: circleci/python:3.8
- image: cimg/python:3.8
steps:
- checkout
- run:
Expand Down Expand Up @@ -234,7 +234,7 @@ jobs:
# Requires org-member context
binary_wheel_upload:
docker:
- image: circleci/python:3.8
- image: cimg/python:3.8
steps:
- attach_workspace:
at: ~/workspace
Expand Down Expand Up @@ -497,7 +497,7 @@ jobs:
- v1-windows-dataset-vector-{{ checksum ".cachekey" }}
- v1-windows-dataset-{{ checksum ".cachekey" }}
{% endraw %}

- run:
name: Run tests
# Downloading embedding vector takes long time.
Expand Down
3 changes: 3 additions & 0 deletions .circleci/unittest/linux/scripts/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ conda activate ./env
printf "* Installing PyTorch\n"
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" ${CONDA_CHANNEL_FLAGS} pytorch cpuonly

printf "Installing torchdata from source\n"
pip install git+https://github.com/pytorch/data.git

printf "* Installing torchtext\n"
git submodule update --init --recursive
python setup.py develop
Expand Down
3 changes: 3 additions & 0 deletions .circleci/unittest/windows/scripts/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ conda activate ./env
printf "* Installing PyTorch\n"
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" ${CONDA_CHANNEL_FLAGS} pytorch cpuonly

printf "Installing torchdata from source\n"
pip install git+https://github.com/pytorch/data.git

printf "* Installing torchtext\n"
git submodule update --init --recursive
"$root_dir/packaging/vc_env_helper.bat" python setup.py develop
Expand Down
43 changes: 22 additions & 21 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. image:: https://circleci.com/gh/pytorch/text.svg?style=svg
:target: https://circleci.com/gh/pytorch/text

.. image:: https://codecov.io/gh/pytorch/text/branch/master/graph/badge.svg
.. image:: https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg
:target: https://codecov.io/gh/pytorch/text

.. image:: https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v
Expand All @@ -12,13 +12,13 @@ torchtext

This repository consists of:

* `torchtext.datasets <https://github.com/pytorch/text/tree/master/torchtext/datasets>`_: The raw text iterators for common NLP datasets
* `torchtext.data <https://github.com/pytorch/text/tree/master/torchtext/data>`_: Some basic NLP building blocks (tokenizers, metrics, functionals etc.)
* `torchtext.nn <https://github.com/pytorch/text/tree/master/torchtext/nn>`_: NLP related modules
* `torchtext.vocab <https://github.com/pytorch/text/tree/master/torchtext/vocab.py>`_: Vocab and Vectors related classes and factory functions
* `examples <https://github.com/pytorch/text/tree/master/examples>`_: Example NLP workflows with PyTorch and torchtext library.
* `torchtext.datasets <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_: The raw text iterators for common NLP datasets
* `torchtext.data <https://github.com/pytorch/text/tree/main/torchtext/data>`_: Some basic NLP building blocks (tokenizers, metrics, functionals etc.)
* `torchtext.nn <https://github.com/pytorch/text/tree/main/torchtext/nn>`_: NLP related modules
* `torchtext.vocab <https://github.com/pytorch/text/tree/main/torchtext/vocab.py>`_: Vocab and Vectors related classes and factory functions
* `examples <https://github.com/pytorch/text/tree/main/examples>`_: Example NLP workflows with PyTorch and torchtext library.

Note: The legacy code discussed in `torchtext v0.7.0 release note <https://github.com/pytorch/text/releases/tag/v0.7.0-rc3>`_ has been retired to `torchtext.legacy <https://github.com/pytorch/text/tree/master/torchtext/legacy>`_ folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See `torchtext.legacy <https://github.com/pytorch/text/tree/master/torchtext/legacy>`_ folder for more details.
Note: The legacy code discussed in `torchtext v0.7.0 release note <https://github.com/pytorch/text/releases/tag/v0.7.0-rc3>`_ has been retired to `torchtext.legacy <https://github.com/pytorch/text/tree/main/torchtext/legacy>`_ folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See `torchtext.legacy <https://github.com/pytorch/text/tree/main/torchtext/legacy>`_ folder for more details.

Installation
============
Expand All @@ -29,14 +29,15 @@ We recommend Anaconda as a Python package management system. Please refer to `py
:header: "PyTorch version", "torchtext version", "Supported Python version"
:widths: 10, 10, 10

nightly build, master, 3.6+
1.9, 0.10, 3.6+
1.8, 0.9, 3.6+
1.7, 0.8, 3.6+
1.6, 0.7, 3.6+
1.5, 0.6, 3.5+
1.4, 0.5, "2.7, 3.5+"
0.4 and below, 0.2.3, "2.7, 3.5+"
nightly build, main, ">=3.6, <=3.9"
1.9, 0.10, ">=3.6, <=3.9"
1.8, 0.9, ">=3.6, <=3.9"
1.7.1, 0.8.1, ">=3.6, <=3.9"
1.7, 0.8, ">=3.6, <=3.8"
1.6, 0.7, ">=3.6, <=3.8"
1.5, 0.6, ">=3.5, <=3.8"
1.4, 0.5, "2.7, >=3.5, <=3.8"
0.4 and below, 0.2.3, "2.7, >=3.5, <=3.8"

Using conda::

Expand Down Expand Up @@ -82,7 +83,7 @@ To build torchtext from source, you need ``git``, ``CMake`` and C++11 compiler s
**Note**

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext.
If you are using the nightly build of PyTorch, checkout the environment it was built with `conda (here) <https://github.com/pytorch/builder/tree/master/conda>`_ and `pip (here) <https://github.com/pytorch/builder/tree/master/manywheel>`_.
If you are using the nightly build of PyTorch, checkout the environment it was built with `conda (here) <https://github.com/pytorch/builder/tree/main/conda>`_ and `pip (here) <https://github.com/pytorch/builder/tree/main/manywheel>`_.

Documentation
=============
Expand Down Expand Up @@ -130,8 +131,8 @@ To get started with torchtext, users may refer to the following tutorials availa

We have re-written several building blocks under ``torchtext.experimental``:

* `Transforms <https://github.com/pytorch/text/blob/master/torchtext/experimental/transforms.py>`_: some basic data processing building blocks
* `Vectors <https://github.com/pytorch/text/blob/master/torchtext/experimental/vectors.py>`_: the vectors to convert tokens into tensors.
* `Transforms <https://github.com/pytorch/text/blob/main/torchtext/experimental/transforms.py>`_: some basic data processing building blocks
* `Vectors <https://github.com/pytorch/text/blob/main/torchtext/experimental/vectors.py>`_: the vectors to convert tokens into tensors.

These prototype building blocks in the experimental folder are available in the nightly release only. The nightly packages are accessible via Pip and Conda for Windows, Mac, and Linux. For example, Linux users can install the nightly wheels with the following command::

Expand All @@ -142,7 +143,7 @@ For more detailed instructions, please refer to `Install PyTorch <https://pytorc
[BC Breaking] Legacy
====================

In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy <https://github.com/pytorch/text/tree/master/torchtext/legacy>`_. This is part of the work to revamp the torchtext library and the motivation has been discussed in `Issue #664 <https://github.com/pytorch/text/issues/664>`_:
In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy <https://github.com/pytorch/text/tree/main/torchtext/legacy>`_. This is part of the work to revamp the torchtext library and the motivation has been discussed in `Issue #664 <https://github.com/pytorch/text/issues/664>`_:

* ``torchtext.legacy.data.field``
* ``torchtext.legacy.data.batch``
Expand All @@ -151,9 +152,9 @@ In the v0.9.0 release, we moved the following legacy code to `torchtext.legacy <
* ``torchtext.legacy.data.pipeline``
* ``torchtext.legacy.datasets``

We have a `migration tutorial <https://colab.research.google.com/github/pytorch/text/blob/master/examples/legacy_tutorial/migration_tutorial.ipynb>`_ to help users switch to the torchtext datasets in ``v0.9.0`` release. For the users who still want the legacy components, they can add ``legacy`` to the import path.
We have a `migration tutorial <https://colab.research.google.com/github/pytorch/text/blob/main/examples/legacy_tutorial/migration_tutorial.ipynb>`_ to help users switch to the torchtext datasets in ``v0.9.0`` release. For the users who still want the legacy components, they can add ``legacy`` to the import path.

In the v0.10.0 release, we retire the Vocab class to `torchtext.legacy <https://github.com/pytorch/text/tree/master/torchtext/legacy>`_. Users can still access the legacy Vocab via ``torchtext.legacy.vocab``. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.
In the v0.10.0 release, we retire the Vocab class to `torchtext.legacy <https://github.com/pytorch/text/tree/main/torchtext/legacy>`_. Users can still access the legacy Vocab via ``torchtext.legacy.vocab``. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.

Disclaimer on Datasets
======================
Expand Down
143 changes: 0 additions & 143 deletions examples/BERT/README.md

This file was deleted.

Loading

0 comments on commit 579c519

Please sign in to comment.