Steps to retire legacy code and release new building blocks in torchtext #985

zhangguanheng66 · 2020-09-16T20:50:48Z

A new abstraction has been described in 0.5.0 release note. Currently, we are working on retiring a few legacy codes in torchtext in the next releases. This issue will track the progress of the relevant work. Here are a few steps that users could expect:

Step 1: Retire legacy codes in `torchtext.data` and `torchtext.datasets`

The following components will be retired from source code soon. We have added a few deprecation warning messages in 0.7.0 release (link). Users can still find them in torchtext.legacy and the original constructors will raise error when calling them.

torchtext.data.field - RawField, Field, ReversibleField, SubwordField, NestedField, LabelField
torchtext.data.iterator - BucketIterator, Iterator, BPTTIterator
torcthtext.data.dataset - Dataset, TabularDataset
torchtext.data.example - Example
torchtext.data.pipeline - Pipeline
torchtext.data.batch - Batch

At the same time, the datasets in torchtext.datasets are based on the legacy code above so they will be moved to the legacy folder:

language_modeling - LanguageModelingDataset, WikiText2, WikiText103, PennTreebank
nli - SNLI, MultiNLI, XNLI
sst - SST
translation - TranslationDataset, Multi30k, IWSLT, WMT14
sequence_tagging - SequenceTaggingDataset, UDPOS, CoNLL2000Chunking
trec - TREC
imdb - IMDB
babi - BABI20

Step 2: Release the new datasets

A few legacy datasets above have been re-written and are currently available in torchtext.experimental.datasets. They will be released to the core library:

language_modeling - LanguageModelingDataset, WikiText2, WikiText103, PennTreebank, WMTNewsCrawl
text_classification - AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
sequence_tagging - UDPOS, CoNLL2000Chunking
translation - Multi30k, IWSLT, WMT14
question_answer - SQuAD1, SQuAD2

Step 3: Retire legacy vocab/vector and release the new data processing building blocks

We also re-written the vocabulary and word vectors as high performance building blocks with the JIT support. We will retire the following components

torchtext.vocab.Vocab
torchtext.vocab.Vectors along with GloVe, FastText, CharNGram.

After this, the new vocabulary and vector building blocks in the experimental folder will be moved to the core library.

torchtext.experimental.vectors
torchtext.experimental.vocab

We also have some transforms that will be released to the core library.

torchtext.experimental.transforms

In general, we understand this is the special time for the torchtext library because we have to handle the legacy code and new building blocks at the same time. We really appreciate the efforts from the OSS community. Users should use the code in the three categories with the following expectations:

legacy folder - we will accept bug fix but not new features
torchtext main folder - we officially support via the stable release and carefully handle BC breaking.
experimental folder - experimental components available via nightly release channel. Users might experience BC breaking without warning messages.

The text was updated successfully, but these errors were encountered:

zhangguanheng66 self-assigned this Sep 16, 2020

This was referenced Sep 17, 2020

Retire legacy code in torchtext Lightning-AI/pytorch-lightning#3529

Closed

arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None #618

Open

This was referenced Nov 2, 2020

TypeError: '<' not supported between instances of 'Example' and 'Example' #474

Closed

Official tutorials use deprecated Classes #936

Closed

imkzh mentioned this issue Mar 14, 2022

TranslationDataset is now deprecated in torchtext jadore801120/attention-is-all-you-need-pytorch#194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steps to retire legacy code and release new building blocks in torchtext #985

Steps to retire legacy code and release new building blocks in torchtext #985

zhangguanheng66 commented Sep 16, 2020 •

edited

Loading

Steps to retire legacy code and release new building blocks in torchtext #985

Steps to retire legacy code and release new building blocks in torchtext #985

Comments

zhangguanheng66 commented Sep 16, 2020 • edited Loading

Step 1: Retire legacy codes in torchtext.data and torchtext.datasets

Step 2: Release the new datasets

Step 3: Retire legacy vocab/vector and release the new data processing building blocks

zhangguanheng66 commented Sep 16, 2020 •

edited

Loading

Step 1: Retire legacy codes in `torchtext.data` and `torchtext.datasets`