Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling SST2 dataset usage in fbcode #1426

Merged
merged 91 commits into from
Oct 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
599bf07
include pytorch 1.5.0-rc1 for CI test
Mar 25, 2020
442fcf0
bump up the version
Mar 25, 2020
b56cb04
Merge branch 'master' into 0.6.0
Apr 9, 2020
3a54c7f
Merge remote-tracking branch 'upstream/master' into 0.6.0
Apr 10, 2020
db4d4fd
Set up ShipIt
Apr 30, 2020
db354a3
Re-sync with internal repository (#749)
cpuhrsch May 1, 2020
ab3c93b
20200429 pytorch/text import
cpuhrsch May 1, 2020
efdf20d
20200430 torchtext import script to include additional meta files
cpuhrsch May 1, 2020
bfbdaec
torchtext flake8, github, travis metafiles
cpuhrsch May 1, 2020
5aeca8d
Import torchtext 20200520 and update build
mthrok May 27, 2020
b79875b
Import torchtext 20200528
mthrok May 28, 2020
be0f749
20200604 torchtext github import
cpuhrsch Jun 5, 2020
4fd0fe2
Import torchtext 20200605
Nayef211 Jun 8, 2020
2fe0031
Back out "Import torchtext 20200605"
Nayef211 Jun 10, 2020
3bdf414
Import torchtext 2020/06/22
Nayef211 Jun 24, 2020
3088591
Fix torch.testing._internal module not found
zhangguanheng66 Jul 1, 2020
8233815
Import torchtext 2020/07/07
Nayef211 Jul 8, 2020
8003a02
remediation of S205607
StanislavGlebik Jul 18, 2020
4e653d1
remediation of S205607
StanislavGlebik Jul 18, 2020
5521477
Import torchtext 2020/07/21
Nayef211 Jul 21, 2020
9d0c4e1
Remove .python3 markers
zertosh Aug 6, 2020
1a786ca
Import torchtext 2020/08/06
Nayef211 Aug 10, 2020
d7762a0
Import torchtext 2020/08/18
Nayef211 Aug 19, 2020
4a7bc10
Import torchtext from 8aecbb9
zhangguanheng66 Sep 1, 2020
f668e3c
Import torchtext 9/4/2020
zhangguanheng66 Sep 4, 2020
d2e1af4
Import github torchtext on 9/9/2020
zhangguanheng66 Sep 13, 2020
8c374dd
Add property support for ScriptModules (#42390)
Sep 15, 2020
2ac6d79
sync with OSS torchtext 9/15/20
zhangguanheng66 Sep 25, 2020
84174db
Import Github torchtext on 9/28/2020
zhangguanheng66 Sep 28, 2020
24b304e
Enable @unused syntax for ignoring properties (#45261)
Sep 29, 2020
5cd2bb2
Import Github torchtext on 10/11/2020
zhangguanheng66 Oct 12, 2020
b074e00
make duplicate def() calls an error in the dispatcher. Updating all f…
bdhirsh Nov 16, 2020
854f715
Revert D24714803: make duplicate def() calls an error in the dispatch…
bdhirsh Nov 17, 2020
522e59b
Import torchtext on Nov 20, 2020
zhangguanheng66 Nov 30, 2020
d1686a9
Updating all call-sites of the legacy dispatcher registration API in …
bdhirsh Dec 2, 2020
e22375e
Import torchtext from github into fbcode on 1/11/2021
zhangguanheng66 Jan 11, 2021
83e53ba
Import torchtext from github #1121 d56fffe
mthrok Jan 20, 2021
6f8a2ce
Import the hidden files in torchtext github repo
zhangguanheng66 Jan 25, 2021
310eac1
add a newline mark to config.yml file (#1128)
datumbox Feb 15, 2021
35a8720
Replace model with full name when spacy load is used (#1140)
datumbox Feb 15, 2021
ab53f2f
Fix the num_lines argument of the setup_iter func in RawTextIterableD…
datumbox Feb 15, 2021
8168aba
Fix broken CI tests due to spacy 3.0 release (#1138)
datumbox Feb 15, 2021
86cc913
Switch data_select in dataset signature to split (#1143)
datumbox Feb 15, 2021
ce0cb15
Add offset arg in the raw text dataset (#1145)
datumbox Feb 15, 2021
125684c
switch to_ivalue to __prepare_scriptable__ (#1080)
datumbox Feb 15, 2021
c34c150
Pass an embedding layer to the constructor of the BertModel class (#1…
datumbox Feb 15, 2021
6a69d55
add __next__ method to RawTextIterableDataset (#1141)
datumbox Feb 15, 2021
3c0ed6a
Add func to count the total number of parameters in a model (#1134)
datumbox Feb 15, 2021
68958f8
Retire the legacy code in torchtext library and fix the dependency of…
zhangguanheng66 Feb 17, 2021
eb2eeae
Sync torchtext GH<->fbcode until GH commit 1197514eb8cc33ccff10f58853…
cpuhrsch Mar 4, 2021
8fdea5d
20210304[2] Sync torchtext GH<->fbcode until GH commit 2764143865678c…
cpuhrsch Mar 8, 2021
81b24b5
20210308 Sync torchtext GH <-> fbcode
cpuhrsch Mar 8, 2021
03c91d4
Re-name raw_datasets.json file with jsonl extension
zhangguanheng66 Mar 9, 2021
066200c
20210329 Sync torchtext up to GH commit eb5e39d3d40525c0064c8e7b7c976…
hwangjeff Mar 29, 2021
f96f374
Import torchtext #1267 93b03e4
parmeet Apr 2, 2021
6a46a5c
Import torchtext #1266 ba0bf52
mthrok Apr 16, 2021
bc5e6e3
Import torchtext #1287 fab63ed
Apr 22, 2021
dac4b9c
Import torchtext #1293 d2a0776
parmeet Apr 26, 2021
b9a38f2
Import torchtext #1291 0790ce6
cpuhrsch Apr 29, 2021
9b3a0af
adding __contains__ method to experimental vocab (#1297)
parmeet Apr 30, 2021
37a41de
Import torchtext #1292 ede6ce65eb5405ff1f8801ff6b354bb1cd242108
NicolasHug May 10, 2021
6231993
Added APIs for default index and removed unk token (#1302)
cpuhrsch May 17, 2021
c8bced1
Swapping experimental Vocab and retiring current Vocab into legacy (#…
parmeet May 19, 2021
3142f4e
Import torchtext #1313 36e33e2
parmeet May 20, 2021
0c55dd9
Adding API usage logging
parmeet May 25, 2021
e9d7593
Import torchtext #1314 99557efd98dd0e74346975d75183dd8aa32eb37e
datumbox May 25, 2021
c56bfbd
Import torchtext #1325 57a1df3
mthrok Jun 9, 2021
625c11d
Import torchtext #1328 ca514f6
mthrok Jun 15, 2021
8816411
up the priority of numpy array comparisons in self.assertEqual (#5906…
heitorschueroff Jun 22, 2021
056bf31
Re-sync with internal repository (#1343)
facebook-github-bot Jun 25, 2021
9b387bc
up the priority of numpy array comparisons in self.assertEqual (#59067)
pmeier Jun 22, 2021
97f66bf
Import torchtext #1300 0435df13924fd4582d67e5b17bc09f6ded18be8b
vincentqb Jun 25, 2021
282aa1d
Import torchtext #1345 8cf471c
parmeet Jun 29, 2021
9534b15
Import torchtext #1352 7ab50af
parmeet Jul 6, 2021
968bdd5
Enabling torchtext datasets access via manifold and iopath
parmeet Jul 6, 2021
929238e
Import torchtext #1361 05cb992
parmeet Jul 23, 2021
cd50c0f
Import torchtext #1365 c57b1fb
yangarbiter Jul 27, 2021
61c1127
Moving Roberta building blocks to torchtext
parmeet Jul 27, 2021
41b3512
Enabling torchtext availability in @mode/opt
parmeet Aug 24, 2021
da5ea68
Import torchtext #1382 aa12e9a
mthrok Aug 27, 2021
2930dba
Simplify cpp extension initialization process
Nayef211 Sep 2, 2021
ae13bc6
fixed bug with incorrect variable name in dataset_utils.py
Nayef211 Sep 23, 2021
579c519
Import torchtext #1410 0930843
parmeet Oct 19, 2021
e6a9677
Import torchtext #1406 1fb2aed
parmeet Oct 19, 2021
d1544d8
Import from github 10/18/21
Nayef211 Oct 21, 2021
963dcaa
Import torchtext #1420 0153ead
parmeet Oct 24, 2021
42e43df
Import torchtext #1421 bcc1455
Nayef211 Oct 24, 2021
18ec37b
Enable OSS torchtext XLMR Base/Large model on fbcode
parmeet Oct 25, 2021
30ca7cf
enabling SST2 dataset usage in fbcode
Nayef211 Oct 26, 2021
30cc393
Merge commit 'c8d441483d1deab82e2c9af369e0cc77ba0d2ec7' into fbsync_t…
Oct 26, 2021
afd9825
Fixed imoporting is_module_available
Oct 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions torchtext/_download_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
from tqdm import tqdm
# This is to allow monkey-patching in fbcode
from torch.hub import load_state_dict_from_url # noqa
from torchtext._internal.module_utils import is_module_available

if is_module_available("torchdata"):
from torchdata.datapipes.iter import HttpReader # noqa F401


def _stream_response(r, chunk_size=16 * 1024):
Expand Down
9 changes: 5 additions & 4 deletions torchtext/experimental/datasets/sst2.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
)

if is_module_available("torchdata"):
from torchdata.datapipes.iter import (
HttpReader,
IterableWrapper,
)
from torchdata.datapipes.iter import IterableWrapper
# we import HttpReader from _download_hooks so we can swap out public URLs
# with interal URLs when the dataset is used within Facebook
from torchtext._download_hooks import HttpReader


NUM_LINES = {
"train": 67349,
Expand Down