A project inspired by Peter Norvig's essay How to Write a Spelling Corrector written in modern C++ and tested with Catch2.
On average, running the correction function takes 1.4 seconds per 10 words.
int main() {
SpellChecker checker("training.txt");
std::cout << checker.correction("expresion") << std::endl; // expression
std::cout << checker.correction("thea") << std::endl; // the
std::cout << checker.correction("helpo") << std::endl; // hello
std::cout << checker.correction("queot") << std::endl; // quote
std::cout << checker.correction("peotry") << std::endl; // poetry
}
-
We read
word_freq.txt
in the SpellChecker's constructor and process each line, which contains an english word followed by a space and then the word's frequency. -
Given a word the SpellChecker generates a set of possible correction candidates. This set is generated by taking words that are one and two edit distances away from the original word. This process produces a lot of words that would not be found in a dictionary so this set is filtered by all known english words.
-
We choose the most likely correction by finding the word that has the max frequency.