Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError: name "DatasetDict" is not defined #2238

Open
jxu0510 opened this issue Jul 24, 2024 · 5 comments
Open

NameError: name "DatasetDict" is not defined #2238

jxu0510 opened this issue Jul 24, 2024 · 5 comments

Comments

@jxu0510
Copy link

jxu0510 commented Jul 24, 2024

Hi, thanks to the authors for this amazing work! I would really appreciate if you can provide any help on the error I encountered. I was following the trainer part in https://www.sbert.net/docs/sentence_transformer/training_overview.html, but when I run

trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=loss,
evaluator=dev_evaluator,
)

I got an error saying: "NameError: name 'DatasetDict' is not defined," even though I made sure DatasetDict is imported. I also searched up, but didn't find any useful information.

Thank you for your time and help in advance!

@pcuenca
Copy link
Member

pcuenca commented Jul 25, 2024

cc @tomaarsen, not sure if this refers to the blog or the documentation.

@mxbi
Copy link

mxbi commented Sep 9, 2024

I get the same issue following the same tutorial. The error comes from:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\sentence_transformers\model_card.py:508, in SentenceTransformerModelCardData.infer_datasets(self, dataset, dataset_name)
    [505]() def infer_datasets(
    [506]()     self, dataset: Union["Dataset", "DatasetDict"], dataset_name: Optional[str] = None
    [507]() ) -> List[Dict[str, str]]:
--> [508]()     if isinstance(dataset, DatasetDict):
    [509]()         return [
    [510]()             dataset
    [511]()             for dataset_name, sub_dataset in dataset.items()
    [512]()             for dataset in self.infer_datasets(sub_dataset, dataset_name=dataset_name)
    [513]()         ]
    [515]()     def subtuple_finder(tuple: Tuple[str], subtuple: Tuple[str]) -> int:

Looks like probably a version mismatch. Will update this comment if i find a workaround

@tomaarsen
Copy link
Member

Hmm, that's quite odd. Do you have datasets installed? pip show datasets/pip install datasets?

  • Tom Aarsen

@mxbi
Copy link

mxbi commented Sep 9, 2024

@tomaarsen I think it's because I installed datasets (following a previous error message which asked me to) during the current notebook session - meaning that some old version probably got cached during import and thus DatasetDict was not available. I suspect the same happened to OP.

Fixed by restarting the kernel 🙂

@tomaarsen
Copy link
Member

Glad to hear that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants