This is for developers. Want to just spellcheck your work? Go here.
JavaScript-WebAssembly to check spelling of Thai text
> checkThaiSpelling("ไข่ใก่ฟองนี้มีขะหนาดไหญ่")
[[ 3, 6], [14, 24]]
//output explanation available below
The files you need are:
-
checker.js
-
thbrk.js
-
thbrk.wasm
-
thbrk.data
Place all four files in the same directory
Custom files location requires some manual tweak.
-
Open
thbrk.js
in your text editor. You need a text editor that can handle very long text in a single line. -
To specify
thbrk.data
location: search forREMOTE_PACKAGE_BASE="thbrk.data"
replace it withREMOTE_PACKAGE_BASE="YOUR/PATH/HERE/thbrk.data"
whereYOUR/PATH/HERE/
is relative to the page HTML (can also be absolute/YOUR/ABSOLUTE/PATH
). -
To specify
thbrk.wasm
location: search forwasmBinaryFile="thbrk.wasm"
replace it withwasmBinaryFile="YOUR/PATH/HERE/thbrk.wasm"
whereYOUR/PATH/HERE/
is relative to the page HTML (can also be absolute/YOUR/ABSOLUTE/PATH
). -
Save
thbrk.js
. -
To specify
thbrk.js
location, edit the first line ofchecker.js
fromimport thaiSpellcheckerBackend from './thbrk.js';
toimport thaiSpellcheckerBackend from './YOUR/PATH/HERE/thbrk.js';
. Path is relative tochecker.js
. Also, you need the./
if your path is relative. Again, absolute path works too. -
To specify
checker.js
location, just specify the path when importing (See Usage section for importing). Ex:import('./YOUR/PATH/HERE/checker.js')
. Path is relative to your JS file. Also, you need the./
if your path is relative. Again, absolute path works too.
This library provides a function checkThaiSpelling
.
input: a javascript string
return: a 2D integer array of n rows and 2 columns, where n is the number of incorrectly spelled words.
Column 1: the index of the first character of a misspelt word.
Column 2: the index of the first character after the misspelt word.
Example:
> checkThaiSpelling("ไข่ใก่ฟองนี้มีขะหนาดไหญ่")
[[ 3, 6], [14, 24]]
//ใก่ and ขะหนาดไหญ่ are marked as incorrect
//note that ขะหนาด and ไหญ่ are grouped together. This is because the backend has no knowledge of the misspelt word and cannot guess where it ends.
//3 -> ใ in ใก่, 6 -> ฟ in ฟอง after ใก่
//14 -> ข in ขะหนาด, 24 -> emptiness after the string (the string is 24 letters long)
In your JavaScript:
import('./checker.js').then(function(module) {
//loadThaiSpellchecker() returns a promise that resolves to the checker function.
module.loadThaiSpellchecker().then(function(checkThaiSpelling) {
//At this point the function checkThaiSpelling is ready.
console.log(checkThaiSpelling("ไข่ใก่ฟองนี้มีขะหนาดไหญ่"));
//You probably want to save the function to a global variable so it can be called from outside this scope.
});
});
Usage example - check spelling of text in a text area and alert misspelt words:
var checkerFunction;//any name you want
var thaiSpellcheckerReady = false;//the checker requires some loading, so it is not ready in the beginning
var someButton;
var someTextArea;
import('./checker.js').then(function(module) {//dynamic import the checker module
module.loadThaiSpellchecker().then(function(r) {//load the checker
//r is the checker function
checkerFunction = r;//save the function to a global var so we can call it from elsewhere
thaiSpellcheckerReady = true;//the checker is ready now
});
});
window.onload = function() {
someButton = document.getElementById("myButton");
someTextArea = document.getElementById("myTextArea");
//on click, check text in the text area and alert errors.
someButton.onclick = function() {
if(!thaiSpellcheckerReady) {
alert("not ready");
return;
}
var text = someTextArea.value;
var checkResult = checkerFunction(text);//check the text
if(checkResult.length > 0) {//result length is 0 if no misspelling is found
alert("mistake(s) found");
//alert every word marked as incorrect
for(var i=0; i<checkResult.length; i++) {
alert("incorrectly spelled word: " + text.slice(checkResult[i][0], checkResult[i][1]));
}
return;
}
else {
alert("no mistake!");
}
}
}
In your JavaScript:
import {loadThaiSpellchecker} from './checker.js';
//loadThaiSpellchecker() returns a promise that resolves to the checker function.
loadThaiSpellchecker().then(function(checkThaiSpelling) {
//At this point the function checkThaiSpelling is ready.
console.log(checkThaiSpelling("ไข่ใก่ฟองนี้มีขะหนาดไหญ่"));
//You probably want to save the function to a global variable so it can be called from outside this scope.
});
If you want to use this in browsers without import
support, use
<script src="thbrk.no_import.js"></script>
<script src="checker.no_import.js"></script>
<script>
loadThaiSpellchecker().then(function(checkThaiSpelling) {
//At this point the function checkThaiSpelling is ready.
console.log(checkThaiSpelling("ไข่ใก่ฟองนี้มีขะหนาดไหญ่"));
});
</script>
But keep in mind that most browsers that support WebAssembly support import
too anyway.
Also be careful of name collision.
Typically, checking should take no more than 10 milliseconds. Still, you should assume the worst (a few hundred milliseconds, probably).
Checking is not always correct. Currently, the checker does not take in account word usage frequency. A word can be misspelt to a less common (but still correct) word (ex. มาก -> มมาก).
libthai, the word breaking backend used, needs 'recovery space' of 3 correct words after an incorrect word.
<incorrect><correct><correct>
//all marked as incorrect
<incorrect><correct><correct><correct>
//only first word marked as incorrect
<incorrect><correct><correct><incorrect><correct>
//all marked as incorrect
<incorrect><correct><correct><correct><incorrect><correct>
//first and last two words marked as incorrect
Word replacements suggestion coming in the future.
GPL v3