Migration tutorial #1203

zhangguanheng66 · 2021-02-22T23:11:58Z

Add the migration tutorial in examples/legacy_tutorial folder

cpuhrsch · 2021-02-23T00:59:29Z

https://colab.research.google.com/github/pytorch/text/blob/9a0ff4569ec481b0382406714aaf2aca607b1651/examples/legacy_tutorial/migration_tutorial.ipynb is the link colab link associated to the latest comment at time of commenting

zhangguanheng66 · 2021-02-23T01:40:30Z

In the README file, I use the link - https://colab.research.google.com/github/zhangguanheng66/text/blob/master/examples/legacy_tutorial/migration_tutorial.ipynb>.

codecov · 2021-02-23T15:57:04Z

Codecov Report

Merging #1203 (906e9cc) into master (db8da95) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1203   +/-   ##
=======================================
  Coverage   73.23%   73.23%           
=======================================
  Files          67       67           
  Lines        3718     3718           
=======================================
  Hits         2723     2723           
  Misses        995      995

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c8b50...9d09f9d. Read the comment docs.

cpuhrsch · 2021-02-23T18:51:52Z

Also, can you update the tutorial to be built against the RC?

cpuhrsch · 2021-02-23T18:55:22Z

As a side note

def bucket_iter_func(pool, batch_size=64):
   for rand_item in pool:
      sorted_item = sorted(rand_item,
                           key=lambda x: len(tokenizer(x[1])))  # x is a tuple of (label, text)
      sorted_dataloader = DataLoader(sorted_item, batch_size=batch_size,
                                     shuffle=False,  # shuffle is set to False to keep the order  
                                     collate_fn=collate_batch)
      for item in sorted_dataloader:
         yield item

train_iter = IMDB(split='train')
train_list = list(train_iter)
batch_size = 8 # A batch size of 8
rand_pools = DataLoader(train_list, batch_size=batch_size*100,
                        shuffle=True, collate_fn=lambda x: x)
sorted_train_dataloader = bucket_iter_func(rand_pools, batch_size=batch_size)

If you just want sublists of size batch_size*100 you can also use Python builtins

shuffle(train_list)
train_lists = [train_list[i:i+ batch_size*100] for i in range(0, len(train_list), batch_size*100)]
train_lists = [sorted(train_list, key=lambda x: len(tokenizer(x[1])))  for train_list in train_lists]
train_lists = sum(train_lists, []) # Very slow way of flattening
dataloader = DataLoader(train_lists, batch_size=batch_size,
                                     shuffle=False,  # shuffle is set to False to keep the order  
                                     collate_fn=collate_batch)

zhangguanheng66 · 2021-02-23T20:14:20Z

As a side note

Yep. Add to the tutorial.

…on_tutorial

facebook-github-bot added the cla signed label Feb 22, 2021

checkpoint

303bb6c

zhangguanheng66 force-pushed the migration_tutorial branch from 9a0ff45 to 303bb6c Compare February 23, 2021 01:38

Update package to torchtext==0.9.0

31e6e18

Switch to pytorch 1.8.0 and torchtext 0.9.0 rc

6956228

zhangguanheng66 and others added 4 commits February 23, 2021 15:24

Created using Colaboratory

4c0a75a

Merge branch 'master' into migration_tutorial

906e9cc

Merge remote-tracking branch 'upstream/master' into migration_tutorial

b876986

Merge remote-tracking branch 'origin/migration_tutorial' into migrati…

9d09f9d

…on_tutorial

cpuhrsch approved these changes Feb 24, 2021

View reviewed changes

zhangguanheng66 merged commit 7d2dbe9 into pytorch:master Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration tutorial #1203

Migration tutorial #1203

zhangguanheng66 commented Feb 22, 2021

cpuhrsch commented Feb 23, 2021

zhangguanheng66 commented Feb 23, 2021

codecov bot commented Feb 23, 2021 •

edited

Loading

cpuhrsch commented Feb 23, 2021

cpuhrsch commented Feb 23, 2021 •

edited

Loading

zhangguanheng66 commented Feb 23, 2021 •

edited

Loading

Migration tutorial #1203

Migration tutorial #1203

Conversation

zhangguanheng66 commented Feb 22, 2021

cpuhrsch commented Feb 23, 2021

zhangguanheng66 commented Feb 23, 2021

codecov bot commented Feb 23, 2021 • edited Loading

Codecov Report

cpuhrsch commented Feb 23, 2021

cpuhrsch commented Feb 23, 2021 • edited Loading

zhangguanheng66 commented Feb 23, 2021 • edited Loading

codecov bot commented Feb 23, 2021 •

edited

Loading

cpuhrsch commented Feb 23, 2021 •

edited

Loading

zhangguanheng66 commented Feb 23, 2021 •

edited

Loading