-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predicts garbage for Bengali input #110
Comments
Do you a sample code snippet which can reproduce your error? If possible, can you also provide the dictionary files you have used? You can probably just use a snippet of the dictionary if the file size is too large. |
@mammothb here you go! |
@mammothb for example try to correct: |
Have you tried setting from pathlib import Path
from symspellpy import SymSpell
sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
sym_spell.load_dictionary(
Path(__file__).resolve().parent / "unigrams.txt",
term_index=0,
count_index=1,
separator=",",
encoding="utf-8",
)
sym_spell.load_bigram_dictionary(
Path(__file__).resolve().parent / "bigrams.txt",
term_index=0,
count_index=1,
separator=",",
encoding="utf-8",
)
input_term = "রিসেট"
suggestions = sym_spell.lookup_compound(
input_term, max_edit_distance=2, split_by_space=True
)
for suggestion in suggestions:
print(suggestion)
print(input_term) Output:
|
@mammothb no, I just used the example as it is in the documentation... I didn’t change any parameters. |
Try and see if |
@mammothb yes it works, thanks! |
I am trying this lookup_compound | Keep original casing example on a Bengali corpus of unigrams and bigrams. As a separator I have used comma. But it seems to be not working. For any misspelled input it is just outputting garbage string.
The text was updated successfully, but these errors were encountered: