Torcharrow based training using RoBERTa model and SST2 classification dataset #1808

parmeet · 2022-06-24T15:11:15Z

Example end-2-end training for RoBERTa model with TorchArrow based text pre-processing.

examples/torcharrow/roberta_sst2_training_with_torcharrow.py

Nayef211

LGTM

Nayef211 · 2022-07-21T17:17:24Z

examples/torcharrow/README.md

+
+#### TorchData
+
+To install TorchData follow instructions athttps://github.com/pytorch/data#installation


nit: add a space after "at"

wenleix · 2022-07-21T19:07:28Z

examples/torcharrow/roberta_sst2_training_with_torcharrow.py

+        # Add EOS token to the end of sentence
+        self.add_eos = T.AddToken(token=2, begin=False)
+
+    def forward(self, input: ta.DataFrame) -> ta.DataFrame:


We can probably just to have a standalone preproc function, similar to https://github.com/pytorch/torchrec/blob/20f543ee3700f2ebb27be6e80d636bd5dd0d7f3c/examples/torcharrow/dataloader.py#L59-L94

SGTM, thanks for the suggestion @wenleix. Let me do a follow-up PR to do that.

parmeet added 2 commits June 22, 2022 17:29

Add initial code for TA based training

59a4b99

Merge branch 'main' of github.com:pytorch/text into torcharrow_training

4408fcc

facebook-github-bot added the cla signed label Jun 24, 2022

Nayef211 reviewed Jun 24, 2022

View reviewed changes

examples/torcharrow/roberta_sst2_training_with_torcharrow.py Show resolved Hide resolved

examples/torcharrow/roberta_sst2_training_with_torcharrow.py Outdated Show resolved Hide resolved

examples/torcharrow/roberta_sst2_training_with_torcharrow.py Outdated Show resolved Hide resolved

parmeet added 2 commits July 20, 2022 09:30

Merge branch 'main' of github.com:pytorch/text into torcharrow_training

6d2f1d6

use native ops for ading tokens

5e41b42

parmeet marked this pull request as ready for review July 20, 2022 13:42

parmeet added 7 commits July 20, 2022 10:11

remove print

b750378

minor changes in code

2abfe00

add readme

2dfa793

fix lint

d31dfc8

edit readme

ebb9f46

minor edits

a42a761

minor edit

3b420a2

parmeet requested review from Nayef211 and wenleix July 21, 2022 14:41

Nayef211 approved these changes Jul 21, 2022

View reviewed changes

minor fix

998b7b9

parmeet merged commit 4fb43aa into pytorch:main Jul 21, 2022

parmeet deleted the torcharrow_training branch July 21, 2022 18:31

wenleix reviewed Jul 21, 2022

View reviewed changes

parmeet mentioned this pull request Jul 22, 2022

Convert TA transform module to prepoc function #1854

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torcharrow based training using RoBERTa model and SST2 classification dataset #1808

Torcharrow based training using RoBERTa model and SST2 classification dataset #1808

parmeet commented Jun 24, 2022 •

edited

Loading

Nayef211 left a comment

Nayef211 Jul 21, 2022

wenleix Jul 21, 2022

parmeet Jul 21, 2022


		#### TorchData

		To install TorchData follow instructions athttps://github.com/pytorch/data#installation

Torcharrow based training using RoBERTa model and SST2 classification dataset #1808

Torcharrow based training using RoBERTa model and SST2 classification dataset #1808

Conversation

parmeet commented Jun 24, 2022 • edited Loading

Nayef211 left a comment

Choose a reason for hiding this comment

Nayef211 Jul 21, 2022

Choose a reason for hiding this comment

wenleix Jul 21, 2022

Choose a reason for hiding this comment

parmeet Jul 21, 2022

Choose a reason for hiding this comment

parmeet commented Jun 24, 2022 •

edited

Loading