-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't Load Russian Language Model due to Caching Bug #357
Comments
hi @jonwiggins ! thank you tons for finding and digging into this bug. it looks like >>> import collections
>>> import pymorphy2
>>> isinstance(pymorphy2.analyzer.Parse, collections.abc.Iterable)
False
>>> hasattr(pymorphy2.analyzer.Parse, "__iter__")
True i would be delighted to accept your PR with the change. i don't have any immediate plans to cut a new release, but if this is a blocker on cb work, i could probably do a small bugfix release... let me know 👍 |
Sounds good. FWIW, I merged option 2 into the pysize repo: bosswissam/pysize#18 Upon review option 3 isn't as great because it is possible for iterable objects to not have It's not a huge blocker but it is something I'd like to get fixed. |
@jonwiggins ah, nice! let's follow the source reference's (your) example and wrap the call in a try/except, with one minor difference: set the logging message's level to WARNING, since the implications for textacy's usage aren't too serious. thanks in advance! |
fixed in |
steps to reproduce
yields
expected vs. actual behavior
The Russian Language model should be loaded
possible solution?
This is because the library which the model uses for Russian and Ukrainian morphology (
pymorphy2
: https://github.com/kmike/pymorphy2) has a class that exhibits some odd behavior.The caching logic (https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/cache.py)
Thinks that
obj
is iterable, and then fails.A similar bug has been reported with
OrderedDict
on the original repo for this caching code (bosswissam/pysize#8) but has not been fixed.I think that there are three possible fixes.
... and not isinstance(obj, (str, bytes, bytearray, type))
. This will fix this specific instance of the bug, but will not fix any possible occurrence of a strange object which has__iter__
but cannot be iterated.try/except
, this will probably have some performance hit.isinstance(obj, collections.Iterable)
.The python documentation (https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable) actually mentions that the third option will not work in all cases, and seems to recommend using a try/except to see if the object can be iterated over. Although I feel like that is a less elegant solution.
@bdewilde If you have feelings towards one of these solutions I'm happy to open a PR to make the change.
environment
spacy
version: 3.2.1spacy
models: Severaltextacy
version: 0.12The text was updated successfully, but these errors were encountered: