Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in SumBasicSummarizer #165

Closed
dennlinger opened this issue Mar 9, 2022 · 1 comment · Fixed by #166
Closed

KeyError in SumBasicSummarizer #165

dennlinger opened this issue Mar 9, 2022 · 1 comment · Fixed by #166
Assignees
Labels

Comments

@dennlinger
Copy link

Hey, first of all, thanks for the great library!

I just encountered a bug with the SumBasicSummarizer, where it seems that the method looks up the document frequency of a stemmed word. However, the actual word_freq_in_doc dictionary only stores the frequencies for unstemmed words.

In particular, I believe that the culprit is the different normalization of content words between _get_content_words_in_sentence() versus the normalization in _get_all_content_words_in_doc(). In particular, the former method performs stemming, whereas the latter does not.

I would have proposed a PR myself, but I don't know which is the "more correct" fix (IMO, consistent stemming should be the way to go?).

FWIW, I used this with German texts, although capitalization etc. seems to be no issue here.

@miso-belica
Copy link
Owner

Thank you for the report and the detailed analysis 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants