Skip to content

Commit

Permalink
Add alternative code for dataset loading from HuggingFace when direct…
Browse files Browse the repository at this point in the history
… access is restricted.
  • Loading branch information
an-yongqi committed Jan 6, 2024
1 parent c41d16d commit 3bb57db
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions lib/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ def get_wikitext2(nsamples, seed, seqlen, tokenizer):
tuple: A tuple containing trainloader (list of input and target pairs) and encoded test dataset.
"""
# Load train and test datasets
# traindata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='train')
# testdata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
traindata = load_dataset('text', data_files='datasets/wikitext/wiki.train.raw', split="train")
testdata = load_dataset('text', data_files='datasets/wikitext/wiki.test.raw', split="train")
traindata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='train')
testdata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
# traindata = load_dataset('text', data_files='datasets/wikitext/wiki.train.raw', split="train")
# testdata = load_dataset('text', data_files='datasets/wikitext/wiki.test.raw', split="train")

# Encode datasets
trainenc = tokenizer(" ".join(traindata['text']), return_tensors='pt')
Expand Down

0 comments on commit 3bb57db

Please sign in to comment.