Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading from Github Releases sometimes times out #9

Closed
micky-git opened this issue Jan 19, 2021 · 8 comments
Closed

Downloading from Github Releases sometimes times out #9

micky-git opened this issue Jan 19, 2021 · 8 comments

Comments

@micky-git
Copy link

tokenizer = Tokenizer.load("en")

ValueError: error decoding response body: operation timed out

@bminixhofer
Copy link
Owner

bminixhofer commented Jan 19, 2021

That's an issue with file hosting via Github Releases. You're the second person to report this, I might check if it's possible to host the files elsewhere.
In the meantime, the workaround mentioned in #6 (comment) works. Download the files manually from the Releases and load them with:

tokenizer = Tokenizer("path/to/en_tokenizer.bin")

@bminixhofer bminixhofer changed the title can't load tokenizer Downloading from Github Releases sometimes times out Jan 19, 2021
@micky-git
Copy link
Author

gunzipping failed: Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }

@bminixhofer
Copy link
Owner

bminixhofer commented Jan 20, 2021

Can you paste your code? And make sure the version of the binary and the library match, maybe that's the issue.

@micky-git
Copy link
Author

version info:
platform:windows64 & mac bigsur
rustc 1.49.0 (e1884a8e3 2020-12-29)
Python 3.7.6
conda 4.9.2
nlprule 0.3.0
tokenizer:https://github.com/bminixhofer/nlprule/releases/download/0.3.0/en_tokenizer.bin.gz
IDE Spyder

code here(python3.7):

# -- coding: utf-8 --
"""
Created on Tue Jan 19 18:41:01 2021
@author: A
"""
from nlprule import Tokenizer, Rules, SplitOn
tokenizer = Tokenizer.load("en_tokenizer.bin")
rules = Rules.load("en", tokenizer, SplitOn([".", "?", "!"]))
rules.correct("He wants that you send him an email.")
rules.correct("Thanks for your’s and Lucy’s help.")
rules.correct("I can due his homework.")
suggestions = rules.suggest("She was not been here since Monday.")
for s in suggestions:
print(s.start, s.end, s.text, s.source, s.message)

output here:

runfile('D:/ai/test/helloworld/test.py', wdir='D:/ai/test/helloworld')
Traceback (most recent call last):

File "D:\ai\test\helloworld\test.py", line 8, in
tokenizer = Tokenizer.load("en_tokenizer.bin")

PanicException: gunzipping failed: Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }`

@micky-git
Copy link
Author

btw:unzip with bandizip(windows_64):http://www.bandisoft.com/

@micky-git
Copy link
Author

I tried:
tokenizer = Tokenizer.load("en_tokenizer.bin")#copy en_tokenizer.bin to workpath
tokenizer = Tokenizer.load("en_tokenizer.bin")#copy en_tokenizer.bin to package_nlprule_path

tokenizer = Tokenizer.load("D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load("D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load("D:/ai/test/helloworld/en_tokenizer.bin")

tokenizer = Tokenizer.load(r"D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load(r"D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load(r"D:\ai\test\helloworld\en_tokenizer.bin")

tokenizer = Tokenizer.load(f"D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load(f"D:\ai\test\helloworld\en_tokenizer.bin")
tokenizer = Tokenizer.load(r"D:\ai\test\helloworld\en_tokenizer.bin")

tokenizer = Tokenizer.load(f"en_tokenizer.bin")
tokenizer = Tokenizer.load("../en_tokenizer.bin")

I'm not quite understand the argument that func_load passes

@bminixhofer
Copy link
Owner

Thanks for the code. So .load takes a language code e.g "en" or "de" as input and downloads the binary automatically. Since this does not seem to work in your case, you can load it manually with the constructor of the Tokenizer, not with .load.

This code should work:

tokenizer = Tokenizer("en_tokenizer.bin") # no .load, just Tokenizer(..)!

@bminixhofer
Copy link
Owner

The binaries are now significantly smaller so I hope this won't happen anymore. I'm closing this for now, please comment / reopen if there's anymore issues with timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants