Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The dataset on huggingface has corrupted files #10

Open
wangth2001 opened this issue Mar 9, 2024 · 1 comment
Open

The dataset on huggingface has corrupted files #10

wangth2001 opened this issue Mar 9, 2024 · 1 comment

Comments

@wangth2001
Copy link

I found that the dataset on huggingface has some corrupted files:

UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_0.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_1.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_2.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_3.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_4.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_5.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_6.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_7.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_8.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_9.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_10.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_11.mp3
UCmt2VkB-sXCTCJ2mFj3L0TA/BYLPYtftaWE/segment_12.mp3
# https://huggingface.co/datasets/voice-is-cool/voxtube/viewer/default/train?p=7939


UCmt2VkB-sXCTCJ2mFj3L0TA/wMvQGqvgykQ/segment_1.mp3
# https://huggingface.co/datasets/voice-is-cool/voxtube/viewer/default/train?p=7941


UCjB-TLtaveC9NYFHQmFOJaQ/bwxLISIV3aM/segment_0.mp3
# https://huggingface.co/datasets/voice-is-cool/voxtube/viewer/default/train?p=7938


UCjB-TLtaveC9NYFHQmFOJaQ/0zN23tGemSg/segment_1.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/0zN23tGemSg/segment_9.mp3
# https://huggingface.co/datasets/voice-is-cool/voxtube/viewer/default/train?p=7938


UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_0.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_1.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_2.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_3.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_4.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_5.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_6.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_7.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_8.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_9.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_10.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_11.mp3
UCjB-TLtaveC9NYFHQmFOJaQ/mOhs5UVJ2dk/segment_12.mp3
# https://huggingface.co/datasets/voice-is-cool/voxtube/viewer/default/train?p=7938

Some of these videos also no longer exist in YouTube, could you re-upload these corrupted files? Many thanks!

@vanIvan
Copy link
Collaborator

vanIvan commented Mar 10, 2024

Hi @wangth2001 , thank you for noticing it, we'll try to fix it as soon as possible. In meantime please use locally cached by hugginface version with this broken/empty files removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants