Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] KeyError: 'label' occurs when loading dataset. #205

Open
2 tasks done
CearX opened this issue Sep 4, 2024 · 0 comments
Open
2 tasks done

[BUG] KeyError: 'label' occurs when loading dataset. #205

CearX opened this issue Sep 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@CearX
Copy link

CearX commented Sep 4, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Description of the Bug

KeyError occurs when loading dataset.

Environment Information

- GCC version:11.4.0
- Torch version:2.4.0+cu121
- Linux system version:Ubuntu 22.04
- CUDA version:12.5
- Torch's CUDA version (as per `torch.cuda.version()`):12.1

To Reproduce

https://github.com/OpenBMB/ModelCenter/blob/main/README-ZH.md#3-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE%E9%9B%86

from model_center.dataset.bertdataset import DATASET
from model_center.dataset import DistributedDataLoader
from model_center.tokenizer import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
splits = ['train', 'dev']
dataset = {}

for split in splits:
dataset[split] = DATASET['BoolQ']('your_path_to_dataset', split, bmt.rank(), bmt.world_size(), tokenizer, max_encoder_length=512)

batch_size = 64
train_dataloader = DistributedDataLoader(dataset['train'], batch_size=batch_size, shuffle=True)
dev_dataloader = DistributedDataLoader(dataset['dev'], batch_size=batch_size, shuffle=False)

Expected Behavior

{
"name": "KeyError",
"message": "'label'",
"stack": "---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[8], line 10
7 dataset = {}
9 for split in splits:
---> 10 dataset[split] = DATASET['BoolQ']('/home/bmtrain/bert', split, bmt.rank(), bmt.world_size(), tokenizer, max_encoder_length=512)
12 batch_size = 64
13 train_dataloader = DistributedDataLoader(dataset['train'], batch_size=batch_size, shuffle=True)

File ~/.local/lib/python3.10/site-packages/model_center/dataset/bertdataset/superglue.py:92, in BoolQ_Dataset.init(self, path, split, rank, world_size, tokenizer, max_encoder_length)
90 from tqdm import tqdm
91 for row in self.read_data("BoolQ", path, split, rank, world_size):
---> 92 label = 1 if row["label"]==True else 0
93 text_a = row['passage']
94 text_b = row['question']

KeyError: 'label'"
}

Screenshots

No response

Additional Information

No response

Confirmation

  • I have reviewed and verified all the information provided in this report.
@CearX CearX added the bug Something isn't working label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant