You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A dictionary passed via --dict is silently ignored when it's in a character encoding that can't be handled.
This is especially problematic since aspell's default character encoding for the .aspell.en.pws is ISO/IEC 8859, which will trigger this bug.
How to reproduce
Create the file test.tex with the following content:
\begin{document}
My name is André and I work at TomTom.
\end{document}
Note that this will yield two spelling mistakes:
$ textidote --check en_us test.tex
TeXtidote v0.8.1 - A linter for LaTeX documents and others
(C) 2018-2019 Sylvain Hallé - All rights reserved
Found 2 warning(s)
Total analysis time: 1 second(s)
* L2C12-L2C17 Possible spelling mistake found. Suggestions: [Andre, Andrew,
Andrea, Andrei, Andres] (11) [lt:en:MORFOLOGIK_RULE_EN_US]
My name is André and I work at TomTom.
^^^^^^
* L2C32-L2C38 Possible spelling mistake found. Suggestions: [Tom Tom] (31)
[lt:en:MORFOLOGIK_RULE_EN_US]
My name is André and I work at TomTom.
^^^^^^^
Now I deleted my aspell dictionary and created a new dictionary by adding both "André" and "TomTom":
$ rm ~/.aspell.en.pws
# now press "a" two times in aspell
$ aspell check test.tex
Note the character encoding of the generated file:
$ file ~/.aspell.en.pws
/home/andre/.aspell.en.pws: ISO-8859 text
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 2
Andr�
TomTom
Now I call textidote again with the newly created dictionary. My expectation would be that now there are no mistakes found - since I whitelisted both words. However this is not the case - instead all words in the dictionary are silently ignored:
$ textidote --check en_us --dict ~/.aspell.en.pws test.tex
TeXtidote v0.8.1 - A linter for LaTeX documents and others
(C) 2018-2019 Sylvain Hallé - All rights reserved
Found 2 warning(s)
Total analysis time: 1 second(s)
* L2C12-L2C17 Possible spelling mistake found. Suggestions: [Andre, Andrew,
Andrea, Andrei, Andres] (11) [lt:en:MORFOLOGIK_RULE_EN_US]
My name is André and I work at TomTom.
^^^^^^
* L2C32-L2C38 Possible spelling mistake found. Suggestions: [Tom Tom] (31)
[lt:en:MORFOLOGIK_RULE_EN_US]
My name is André and I work at TomTom.
^^^^^^^
Workaround
As a workaround we can convert the dictionary to utf8 and then everything will work:
$ iconv -f ISO-8859-1 -t UTF-8 ~/.aspell.en.pws > .aspell.en.pws
$ file .aspell.en.pws
.aspell.en.pws: UTF-8 Unicode text
$ textidote --check en_us --dict .aspell.en.pws test.tex
TeXtidote v0.8.1 - A linter for LaTeX documents and others
(C) 2018-2019 Sylvain Hallé - All rights reserved
Found 0 warning(s)
Total analysis time: 1 second(s)
Everything is OK!
Remarks
Not sure if this is important but here is my aspell version:
$ aspell -v
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
It would be handy if textidote produced a hard error if the character encoding isn't supported. It took me some time to debug why my dictionary was getting ignored.
And thanks for creating textidote, it's a very helpful program :)
The text was updated successfully, but these errors were encountered:
Actually, the Scanner class itself silently fails when it reads a file that does not match the expected encoding, and just won't read anything from the file. However, no exception is thrown, so I cannot catch the encoding problem. The best that could be done is a warning given to the user if nothing has been read from the dictionary file.
Description
A dictionary passed via
--dict
is silently ignored when it's in a character encoding that can't be handled.This is especially problematic since
aspell
's default character encoding for the.aspell.en.pws
is ISO/IEC 8859, which will trigger this bug.How to reproduce
Create the file
test.tex
with the following content:Note that this will yield two spelling mistakes:
Now I deleted my aspell dictionary and created a new dictionary by adding both "André" and "TomTom":
Note the character encoding of the generated file:
Now I call
textidote
again with the newly created dictionary. My expectation would be that now there are no mistakes found - since I whitelisted both words. However this is not the case - instead all words in the dictionary are silently ignored:Workaround
As a workaround we can convert the dictionary to utf8 and then everything will work:
Remarks
Not sure if this is important but here is my aspell version:
It would be handy if textidote produced a hard error if the character encoding isn't supported. It took me some time to debug why my dictionary was getting ignored.
And thanks for creating textidote, it's a very helpful program :)
The text was updated successfully, but these errors were encountered: