Skip to content

Modern spell checking library - accurate, fast, multi-language. Forked for medical terms


Notifications You must be signed in to change notification settings



Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


Build Status Release

JamSpell is a spell checking library with following features:

  • accurate - it consider words surroundings (context) for better correction
  • fast - near 5K words per second
  • multi-language - it's written in C++ and available for many languages with swig bindings



Errors Top 7 Errors Fix Rate Top 7 Fix Rate Broken Speed
JamSpell 3.25% 1.27% 79.53% 84.10% 0.64% 4854
Norvig 7.62% 5.00% 46.58% 66.51% 0.69% 395
Hunspell 13.10% 10.33% 47.52% 68.56% 7.14% 163
Dummy 13.14% 13.14% 0.00% 0.00% 0.00% -

Model was trained on 300K wikipedia sentences + 300K news sentences (english). 95% was used for train, 5% was used for evaluation. Errors model was used to generate errored text from the original one. JamSpell corrector was compared with Norvig's one, Hunspell and a dummy one (no corrections).

We used following metrics:

  • Errors - percent of words with errors after spell checker processed
  • Top 7 Errors - percent of words missing in top7 candidated
  • Fix Rate - percent of errored words fixed by spell checker
  • Top 7 Fix Rate - percent of errored words fixed by one of top7 candidates
  • Broken - percent of non-errored words broken by spell checker
  • Speed - number of words per second

To ensure that our model is not too overfitted for wikipedia+news we checked it on "The Adventures of Sherlock Holmes" text:

Errors Top 7 Errors Fix Rate Top 7 Fix Rate Broken Speed (words per second)
JamSpell 3.56% 1.27% 72.03% 79.73% 0.50% 5524
Norvig 7.60% 5.30% 35.43% 56.06% 0.45% 647
Hunspell 9.36% 6.44% 39.61% 65.77% 2.95% 284
Dummy 11.16% 11.16% 0.00% 0.00% 0.00% -

More details about reproducing available in "Train" section.



  1. Install swig3 (usually it is in your distro package manager)

  2. Install jamspell:

pip install jamspell
  1. Download or train language model

  2. Use it:

import jamspell

corrector = jamspell.TSpellCorrector()

corrector.FixFragment('I am the begt spell cherken!')
# u'I am the best spell checker!'

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3)
# (u'best', u'beat', u'belt', u'bet', u'bent', ... )

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5)
# (u'checker', u'chicken', u'checked', u'wherein', u'coherent', ...)


  1. Add jamspell and contrib dirs to your project

  2. Use it:

#include <jamspell/spell_corrector.hpp>

int main(int argc, const char** argv) {

    NJamSpell::TSpellCorrector corrector;

    corrector.FixFragment(L"I am the begt spell cherken!");
    // "I am the best spell checker!"

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "best", "beat", "belt", "bet", "bent", ... )

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "checker", "chicken", "checked", "wherein", "coherent", ... )
    return 0;

Other languages

You can generate extensions for other languages using swig tutorial. The swig interface file is jamspell.i. Pull requests with build scripts are welcome.


Option 1 - python (via flask)

  • Will run on port 80, open to anyone (not just localhost) by default.
  • Expects the model to be in the same folder as and be named medical_model.bin (since this fork is for the medical spell check)
  • Gives a few more options than the c++ option. Specifically these params can be sent with the GET or POST api call
    • limit ... limit number of items per candidate on response from the /candidates endpoint to this i.e. /candidates?limit=1&text=blahblah
    • html ... if set, will return a human-readable html table instead of json. Works for /fix and /candidates i.e. /fix?html=1&text=blahblah

Option 2 - c++

  • Install cmake

  • Clone and build medSpellCheck (it includes http server):

git clone
cd medSpellCheck
mkdir build
cd build
cmake ..

on Windows replace the 'make' command with:

cmake --build . --target ALL_BUILD --config Release
./web_server/web_server en.bin localhost 8080
  • GET Request example:
$ curl "http://localhost:8080/fix?text=I am the begt spell cherken"
I am the best spell checker
  • POST Request example
$ curl -d "I am the begt spell cherken" http://localhost:8080/fix
I am the best spell checker
  • Candidate example
curl "http://localhost:8080/candidates?text=I am the begt spell cherken"
# or
curl -d "I am the begt spell cherken" http://localhost:8080/candidates
    "results": [
            "candidates": [
            "len": 4,
            "pos_from": 9
            "candidates": [
            "len": 7,
            "pos_from": 20

Here pos_from - misspelled word first letter position, len - misspelled word len


To train custom model you need:

  1. Install cmake

  2. Clone and build medSpellCheck:

git clone
cd medSpellCheck
mkdir build
cd build
cmake ..
    1. MUST HAVE Visual Studio 2019 Community Edition (or greater) installed as well as Visual Studio 2019 C++ Build Tools!!!
    2. cmake .. will build a shit .exe unless you've followed ^^^
    3. replace the 'make' command with: (note that the jamspell.exe executable will be located in the /build/main/Release/ folder)
      cmake --build . --target ALL_BUILD --config Release
  1. Prepare a utf-8 text file with sentences to train at (eg. sherlockholmes.txt) and another file with language alphabet (eg. alphabet_en.txt)

  2. Train model:

./main/jamspell train ../test_data/alphabet_en.txt ../test_data/sherlockholmes.txt model_sherlock.bin
  1. To evaluate spellchecker you can use evaluate/ script:
python evaluate/ -a alphabet_file.txt -jsp your_model.bin -mx 50000 your_test_data.txt
  1. You can use evaluate/ to generate you train/test data. It supports txt files, Leipzig Corpora Collection format and fb2 books.

  2. Send it stuff like this: curl "http://localhost:55555/candidates?text=This is a 62 yer old femle with high blod pressur and she has had a lap appendectoy by an aneesthesiologist also she has dibetes mellitus. she takes 50mg of metopfolol per day and an 81mg asprin and 15miligram hydrochlorathiozide plus his mother is a smker and has had a bunch of seezures. they like icee creem and pzza. hx of coranary artery dizease and has had a transeent ishcemic attak"

Download models

Here is our medical model pre-trained on a large medical corpus (a few million records):

Here are a few simple models. They trained on 300K news + 300k wikipedia sentences. We strongly recommend to train your own model, at least on a few million sentences to achieve better quality. See Train section above.


Modern spell checking library - accurate, fast, multi-language. Forked for medical terms







No packages published


  • C++ 95.7%
  • Python 3.7%
  • Other 0.6%