Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy option seems inconsistent #97

Open
dysby opened this issue May 26, 2023 · 2 comments
Open

Greedy option seems inconsistent #97

dysby opened this issue May 26, 2023 · 2 comments
Labels
question Further information is requested

Comments

@dysby
Copy link

dysby commented May 26, 2023

Hi,
using your library version: 0.9.1

I found inconsistent behavior when using greedy option.
See example below, where I was expecting the lemmatized versions of the text to be equal when we force greedy option.

>>> text_lemmatizer("fire crew", lang="en")
['fire', 'crow']
>>> text_lemmatizer("fire crews", lang="en", greedy=True)
['fire', 'crew']
>>> text_lemmatizer(" ".join(text_lemmatizer("fire crews", lang="en", greedy=True)), lang="en")
['fire', 'crow']

Thanks,

@adbar
Copy link
Owner

adbar commented May 30, 2023

Hi @dysby, good catch!

My guess would be that the results are cached internally, which affects the results of text_lemmatizer(). In any case it is worth looking further into the issue.

@adbar adbar added the question Further information is requested label May 30, 2023
@dysby dysby closed this as completed May 30, 2023
@dysby dysby reopened this May 30, 2023
@dysby
Copy link
Author

dysby commented May 30, 2023

I think it has to do with minimum word length in simplemma.py#L495 at latest release 0.9.1.

Not sure if recent code does the same.

@adbar adbar added this to the v1.0 milestone Jun 21, 2023
@adbar adbar removed this from the v1.0 milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants