Add a basic spellchecking functionality to information page #4

nonprofittechy · 2022-04-04T21:27:25Z

Fix #1

This feature is pretty basic but it works similarly to the other functionality for now.

A major limit is that the spelling errors aren't displayed with any context, but that's the same for the remaining warnings and errors. I assume we will eventually want to display errors and warnings on a per-block basis so that the author gets more context.

We may want to consider adding our own library of terms that will be incorrectly flagged, as the pyspellchecker dataset is drawn from closed captioning and may not capture some relatively common legal language.

BryceStevenWilley · 2022-04-13T20:42:08Z

I tried it out, and while I agree with the feature, I'm not a big fan of pyspellchecker. I thought things were broken because it wasn't catching when I changed "will" to "willl", but turns out it just thinks that "willl" is a valid word. The data comes from OpenSubtitles, and at a quick glance contains words like "willl", "willservicethisgreatnation" (one word), "hoje" (a Portuguese word), and "hi'm". Others have run into this issue as well. While we could use our own dictionary, it might be worth it to just use something else like symspellpy instead.

nonprofittechy · 2022-04-14T17:16:26Z

I tried it out, and while I agree with the feature, I'm not a big fan of pyspellchecker. I thought things were broken because it wasn't catching when I changed "will" to "willl", but turns out it just thinks that "willl" is a valid word. The data comes from OpenSubtitles, and at a quick glance contains words like "willl", "willservicethisgreatnation" (one word), "hoje" (a Portuguese word), and "hi'm". Others have run into this issue as well. While we could use our own dictionary, it might be worth it to just use something else like symspellpy instead.

I hadn't run into that library in my initial search, but it looks fine. Happy to switch to a different dictionary.

BryceStevenWilley · 2022-04-26T04:32:55Z

Did a little work on this. While symspellpy is really powerful, it too has weird quirks (it doesn't seem to like capitalization very much, and the options to preserve it don't seem to work at all). The API is different, but idk if I'd call it better or not. From all of this, the best solution is probably to make our own dictionary, with a much stricter source corpus. The author of pyspellchecker describes how to do that in a github discussion.

Since the API itself isn't going to be changing, just the dictionary, this is probably fine to merge now. We can add the better dictionary later, so I'm going to approve.

nonprofittechy requested a review from BryceStevenWilley April 4, 2022 21:27

Add a basic spellchecking functionality to information page

6cabcb7

BryceStevenWilley force-pushed the spellchecking branch from 41e51fa to 6cabcb7 Compare April 13, 2022 19:34

BryceStevenWilley approved these changes Apr 26, 2022

View reviewed changes

nonprofittechy merged commit da9a1eb into main Apr 26, 2022

nonprofittechy deleted the spellchecking branch April 26, 2022 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a basic spellchecking functionality to information page #4

Add a basic spellchecking functionality to information page #4

nonprofittechy commented Apr 4, 2022 •

edited

Loading

BryceStevenWilley commented Apr 13, 2022

nonprofittechy commented Apr 14, 2022

BryceStevenWilley commented Apr 26, 2022

Add a basic spellchecking functionality to information page #4

Add a basic spellchecking functionality to information page #4

Conversation

nonprofittechy commented Apr 4, 2022 • edited Loading

BryceStevenWilley commented Apr 13, 2022

nonprofittechy commented Apr 14, 2022

BryceStevenWilley commented Apr 26, 2022

nonprofittechy commented Apr 4, 2022 •

edited

Loading