Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

What is the size of the processed data? #24

Open
leoozy opened this issue Jul 6, 2022 · 1 comment
Open

What is the size of the processed data? #24

leoozy opened this issue Jul 6, 2022 · 1 comment

Comments

@leoozy
Copy link

leoozy commented Jul 6, 2022

Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether it is right?

@peteriz
Copy link
Contributor

peteriz commented Jul 27, 2022

Sounds about right.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants