Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate data in Timit dataset #2188

Closed
thanh-p opened this issue Apr 8, 2021 · 2 comments
Closed

Duplicate data in Timit dataset #2188

thanh-p opened this issue Apr 8, 2021 · 2 comments

Comments

@thanh-p
Copy link

thanh-p commented Apr 8, 2021

I ran a simple code to list all texts in Timit dataset and the texts were all the same.
Is this dataset corrupted?
Code:
timit = load_dataset("timit_asr")
print(*timit['train']['text'], sep='\n')
Result:
Would such an act of refusal be useful?
Would such an act of refusal be useful?
Would such an act of refusal be useful?
Would such an act of refusal be useful?
...
...
Would such an act of refusal be useful?

@lhoestq
Copy link
Member

lhoestq commented Apr 8, 2021

Hi ! Thanks for reporting
If I recall correctly this has been recently fixed #1995
Can you try to upgrade your local version of datasets ?

pip install --upgrade datasets

@thanh-p
Copy link
Author

thanh-p commented Apr 8, 2021

Hi Ihoestq,

Thank you. It works after upgrading the datasets

@thanh-p thanh-p closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants