-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make_wikipedia.py fails on linux #58
Comments
Once this is fixed I also get the following error:
This is the command I use for running it: |
Hi @peterbjorgensen! thank you for this bug report. I've made a PR (#64) with these fixes in. I can't seem to reproduce the error gzip... could you tell me a bit more about your setup (platform, python version, etc.) |
I am on |
Even using Python 3.11.8 , the error is the same as follows: |
@soldni I think this needs to be fixed, please check it. |
I remain unable to reproduce this issue on my side, would need more info. |
@soldni wikiextractor : 3.0.6 |
I update wikiextractor from 3.0.6 to 3.0.7,solve the bug Error -3 while decompressing data: invalid block type. But get : Error -3 while decompressing data: invalid stored block lengths |
Thanks for your hard work. would you please show me your commands that
will be helpful for me to follow the procedure. Thanks in advance
夜白 ***@***.***> 于2024年4月8日周一 22:30写道:
… First save the extracted file as a json file, and then compress it, and
get the correct answer. So I'm guessing it's some kind of bug about Instant
Compression + multi-process mechanism of Wikiextractor.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKFA67OYIBPN4HK2J7P2GJLY4KSZHAVCNFSM6AAAAAA6D4HMDCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSHEYDQOJUGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Have you solved this problem, i faced this problem, and i don't have the chance to follow tagger step. |
The bug can be fixed by setting
multiprocessing.set_start_method("spawn")
in the
__main__
environment.Perhaps the dolma core/parallel.py should use
multiprocessing.get_context("spawn")
to avoid this.The text was updated successfully, but these errors were encountered: