NOTE This repo has been updated to keep up with libthai 0.1.28-4 (Deb). Therefore, it's working with the new versions of Postgresql including 13 (the one I'm using with).
pg-search-thai is full text search PostgreSQL extension for Thai Language.
Its main purpose is:
To enable PostgreSQL Full Text Search in Thai language (Due to Thai Language does not use spaces to separate words)
libthai, libiconv - this pg extension requires Thai word breaking functionality from the popular LibThai
and libiconv.
postgresql - pg_config
in order to build this extension.
-
Download the libiconv.
-
Install libthai and libiconv on your local system.
-
Install the extension from source, go to project root directory (
cd pg-search-thai
). Then, you can simply run:make all
-
If you would like to install only the thai parser, just go into thai_parser directory. Then, compile and install it, like so:
cd thai_parser; make; make install
-
Start the psql console ( Or any postgresql client, pgAdmin for instance ) and create the extension you have just installed by typing following commands:
CREATE EXTENSION thai_parser;
CREATE TEXT SEARCH CONFIGURATION thaicfg (PARSER = thai_parser);
ALTER TEXT SEARCH CONFIGURATION thaicfg ADD MAPPING FOR a WITH simple;
-
Note: This extension is only tested with
UTF-8
encoding. So, it is highly recommended to initial database with utf-8.
Check how parser works.
SELECT * FROM ts_parse('thai_parser', 'ต้มยำกุ้งน้ำข้น ( Thai sour and spicy shrimp soup ) และไข่เจียวร้อนๆ');
Try to build document from thaicfg
configuration that uses the specified parser.
SELECT to_tsvector('thaicfg', 'ต้มยำกุ้งน้ำข้น ( Thai sour and spicy shrimp soup ) และไข่เจียวร้อนๆ');
Querying
SELECT to_tsvector('thaicfg', 'the land of somtum (ส้มตำ)') @@ to_tsquery('thaicfg','ส้มตำ');
?column?
----------
t
(1 row)
Querying with |
and &
operator.
SELECT to_tsvector('thaicfg', 'ส้มตำไก่ย่าง ต้มยำกุ้ง in thailand') @@ to_tsquery('thaicfg','ข้าวเหนียว&ส้มตำ');
?column?
----------
f
(1 row)
SELECT to_tsvector('thaicfg', 'ข้าวเหนียวส้มตำไก่ย่าง ต้มยำกุ้ง in thailand') @@ to_tsquery('thaicfg','ข้าวเหนียว&ส้มตำ');
?column?
----------
t
(1 row)
If you want to use hunspell as a dictionary for the full text search.
Make sure you have already install thai hunspell dictionay files in pg_config --sharedir
/tsearch_data directory.
CREATE TEXT SEARCH DICTIONARY thai_hunspell (
TEMPLATE = ispell,
DictFile = th_TH,
AffFile = th_TH,
StopWords = english
);
In psql console type \dFd
to see if dictionary is installed.
Then,
ALTER TEXT SEARCH CONFIGURATION thaicfg ADD MAPPING FOR a WITH simple, thai_hunspell;
And, test with,
SELECT ts_lexize('thai_hunspell', 'ทดสอบ');
GitHub issue tracker and pull requests are welcome.
pg-search-thai is released under the GNU General Public License (GPLv2). Refer to License FAQ for more information.