`KeyError` in `SumBasicSummarizer` #165

dennlinger · 2022-03-09T12:55:44Z

Hey, first of all, thanks for the great library!

I just encountered a bug with the SumBasicSummarizer, where it seems that the method looks up the document frequency of a stemmed word. However, the actual word_freq_in_doc dictionary only stores the frequencies for unstemmed words.

In particular, I believe that the culprit is the different normalization of content words between _get_content_words_in_sentence() versus the normalization in _get_all_content_words_in_doc(). In particular, the former method performs stemming, whereas the latter does not.

I would have proposed a PR myself, but I don't know which is the "more correct" fix (IMO, consistent stemming should be the way to go?).

FWIW, I used this with German texts, although capitalization etc. seems to be no issue here.

The text was updated successfully, but these errors were encountered:

miso-belica · 2022-03-09T22:32:16Z

Thank you for the report and the detailed analysis 🙂

miso-belica self-assigned this Mar 9, 2022

miso-belica added the bug label Mar 9, 2022

miso-belica mentioned this issue Mar 9, 2022

Fix SumBasicSummarizer with stemmer #166

Merged

miso-belica closed this as completed in #166 Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`KeyError` in `SumBasicSummarizer` #165

`KeyError` in `SumBasicSummarizer` #165

dennlinger commented Mar 9, 2022

miso-belica commented Mar 9, 2022

KeyError in SumBasicSummarizer #165

KeyError in SumBasicSummarizer #165

Comments

dennlinger commented Mar 9, 2022

miso-belica commented Mar 9, 2022

`KeyError` in `SumBasicSummarizer` #165

`KeyError` in `SumBasicSummarizer` #165