Documentation for the package rsky/php-mecab.
(Please note that I am a Linux user and have only tested the Linux installation guide. The Mac and Windows installation guides have been pieced together from other sources.)
Ubuntu users can use the install script included in this repository to install mecab and php-mecab.
Download the script:
curl -O https://raw.githubusercontent.com/nihongodera/php-mecab-documentation/master/install-php-mecab.sh
Make the file executable:
chmod +x install-php-mecab.sh
Execute the script:
./install-php-mecab.sh
For information about what the script does, see here.
Before installing php-mecab, you must install MeCab.
Linux users can more than likely find MeCab in their distro repositories. Simply install 'mecab' and the package 'mecab-ipadic-utf8'. Ubuntu users can do this with the following command.
sudo apt-get install mecab mecab-ipadic-utf8
If that doesn't work, you can download the source and build it yourself. Note that this will require the package 'build-essential'.
First pull in MeCab.
wget https://mecab.googlecode.com/files/mecab-0.996.tar.gz
tar zxfv mecab-0.996.tar.gz
cd mecab-0.996
./configure --with-charset=utf8 --enable-utf8-only
Then get the dictionary file.
wget https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz
cd mecab-ipadic-2.7.0-20070801
./configure --with-charset=utf8
Both MeCab and the required dictionary (mecab-ipadic-utf8) are in MacPorts. If that doesn't work, try downloading the source and building it yourself. You can get the source and the dictionary from the following urls:
https://mecab.googlecode.com/files/mecab-0.996.tar.gz
https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
I believe you can build these files with Xcode. Somebody correct me if I'm wrong.
Download the installer from this url: https://mecab.googlecode.com/files/mecab-0.996.exe
First, verify that you have MeCab on your computer by testing it in the command line. Type mecab
and if you don't get an error, things are looking good. If you get an error that looks something like this param.cpp(69) [ifs] no such file or directory: /usr/local/lib/mecab/dic/ipadic/dicrc
you need to find your dictionary file and pass it as a parameter. The directory is called 'ipadic-utf8' and needs to contain a file called 'unk.dic'.
mecab --dicdir=/path/to/dictionary/dic/ipadic/
Once you get mecab to start, type some Japanese and make sure you get an appropriate response.
~$ mecab
やった!
やっ 動詞,自立,*,*,五段・ラ行,連用タ接続,やる,ヤッ,ヤッ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
! 記号,一般,*,*,*,*,!,!,!
EOS
Install the following dependencies:
php5:
php5-dev
libmecab-dev
build-essential
sudo apt-get install php5-dev libmecab-dev build-essential
php7:
php7.0-dev
libmecab-dev
build-essential
sudo apt-get install php7.0-dev libmecab-dev build-essential
Download the php-mecab source.
wget https://github.com/rsky/php-mecab/archive/master.zip
You will need to find the package 'mecab-config'. It is usually located at /usr/bin/mecab-config, but check to make sure. Let's use 'locate' because its easy.
sudo updatedb
locate mecab-config
That should give you a path that looks something like /usr/bin/mecab-config.
We should now be ready to build our package. Put your mecab-config path after the --with-mecab-config option.
unzip master.zip
cd php-mecab-master/mecab
phpize
sudo ./configure --with-php-config=/usr/bin/php-config --with-mecab-config=/path/to/mecab-config
sudo make
sudo make install
Occasionally, configure will fail and throw the following error:
configure: error: wrong MeCab library version or lib not found. Check config.log for more information
This usually happens when mecab didn't install properly. To fix this, purge all mecab packages:
sudo apt-get --purge remove mecab mecab-ipadic-utf8 mecab-utils libmecab-dev
This will often not remove all the binaries so you may have to manually go into bin and remove them yourself.
sudo rm /usr/local/bin/mecab
sudo rm /usr/local/bin/mecab-config
Then, reinstall everything:
sudo apt-get install mecab mecab-ipadic-utf8 mecab-utils libmecab-dev
After completing this step, you should have a mecab.so. Go to /usr/lib/php5/ and find the package with a name that looks is similar to this: 20131226. Have a look in that file and mecab.so should be in there.
We now just need to enable the mod.
For php5:
Move to /etc/php5/mods-available/
cd /etc/php5/mods-available/
Next, create a new .ini file for mecab.
sudo touch mecab.ini
echo "extension=mecab.so" | sudo tee -a mecab.ini
And then we need to activate the module.
sudo php5enmod mecab
For php7:
Move to /etc/php/php7.0/mods-available/
cd /etc/php/php7.0/mods-available/
Next, create a new .ini file for mecab.
sudo touch mecab.ini
echo "extension=mecab.so" | sudo tee -a mecab.ini
And then we need to activate the module.
sudo phpenmod -v 7.0 mecab
Once this is done, you simply need to restart your web server.
For Apache:
sudo service apache2 restart
And for nginx:
sudo service nginx restart
You should be ready to go.
Instructions should be the same as for Linux, but you may require the package xcode in order to properly compile the source code.
Installing php-mecab is the same as installing any other php extension. The following guide may be of use:
http://php.net/manual/en/install.windows.extensions.php
According to one of the php-mecab readme files:
The extension provides the VisualStudio V6 project file mecab.dsp. To compile the extension you open this file using VisualStudio, select the apropriate configuration for your installation (either "Release_TS" or "Debug_TS") and create "php_mecab.dll"
After successfull compilation you have to copy the newly created "php_mecab.dll" to the PHP extension directory (default: C:\PHP\extensions).
php-mecab can be used functionally or as an object. I prefer the OOP approach, but I will try to cover both approaches in this guide. Note that as of version 0.6.0, the procedural functions will not work in php 7.
MeCab sometimes requires a dictionary directory to be passed to it on initialization. The location of the directory seems to vary by system, so find 'ipadic-utf8' on your system and pass the full folder path. Often, there will be more than one 'ipadic-utf8' folders on a system. Make sure the one you use contains a file called 'unk.dic'. Without this, mecab will fail to initialize. Pass the the dictionary directory to MeCab with the console flag '-d' in an array.
The options passed to MeCab are the same as the options used in the command line program. Send them to the constructor in an array. Check the man page for MeCab for all available options.
New up a MeCab\Tagger object. Version 0.6.0:
$mecab = new \MeCab\Tagger();
Earlier versions:
$mecab = new \MeCab_Tagger();
If it does't work, or you get an error, try passing the array containing the command line flag '-d' and a dictionary folder path to it as a parameter.
$mecab = new \MeCab\Tagger(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']);
The variable $mecab will be a MeCab\Tagger object.
Throughout this guide, when I refer to $mecab in the object orientated sections, it will be a Tagger object.
Use the function mecab_new() to get a mecab resource. As with the Object Orientated approach, you may or may not have to pass it a dictionary directory.
$mecab = mecab_new(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']);
The $mecab variable will be a resource of type 'mecab'.
Throughout this guide, when I refer to $mecab in the functional sections, it will be a MeCab resource.
Split methods only split a string into an array of morphemes. They provide no information about the morphemes.
As of version 0.6.0, the split method is no longer on the Tagger object. The following only applies to previous versions.
The split() method is static and so does not require an instance of Tagger. It might, however, need the dictionary directory path to be passed as an argument in order to function.
$split = \Mecab_Tagger::split('眠いです');
Or if that doesnt work.
$split = \Mecab_Tagger::split('眠いです', '/path/to/dictionary/mecab/dic/ipadic-utf8');
print_r($split);
// Results
Array
(
[0] => 眠い
[1] => です
)
If you have an instance of MeCab\Tagger you can also call the method on the object. You will still need to pass the dictionary directory.
$split = $mecab->split('たこ焼きが食べたい');
print_r($split);
// Results
Array
(
[0] => たこ焼き
[1] => が
[2] => 食べ
[3] => たい
)
Use the funtion mecab_split(). It may or may not require the dictionary directory to be passed.
$split = mecab_split('パンダをいくらで買いますか');
Or....
$split = mecab_split('パンダをいくらで買いますか', '/path/to/dictionary/mecab/dic/ipadic-utf8');
print_r($split);
// Results
Array
(
[0] => パンダ
[1] => を
[2] => いくら
[3] => で
[4] => 買い
[5] => ます
[6] => か
)
MeCab will parse strings of Japanese text and return results in either string form or as a MeCab\Node. The MeCab\Node class seems a little awkward and difficult to deal with at first, but they give the user a lot of power and make parsing results a little easier.
To parse a string and get results in string form, a couple options exist. The first is the parse() method.
$results = $mecab->parse('チョコレートがやめられない');
echo $results;
// Results
チョコレート 名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート
が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
やめ 動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ
られ 動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ
ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
EOS
You could also use the parseToString() method which produces the exact same results.
$results = $mecab->parseToString('チョコレートがやめられない');
echo $results;
// Results
チョコレート 名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート
が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
やめ 動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ
られ 動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ
ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
EOS
To get results in node form, use parseToNode().
$node = $mecab->parseToNode('ご飯作りたくない');
var_dump($node);
// Results
object(MeCab\Node) (0) {
}
To get results as a string, use the function mecab_sparse_tostr().
$node = mecab_sparse_tostr($mecab, 'パンダいらないよね');
echo $node;
// Results
パンダ 名詞,一般,*,*,*,*,パンダ,パンダ,パンダ
いら 動詞,自立,*,*,五段・ラ行,未然形,いる,イラ,イラ
ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
よ 助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
ね 助詞,終助詞,*,*,*,*,ね,ネ,ネ
EOS
For node results, use mecab_sparse_tonode().
$node = mecab_sparse_tonode($mecab, 'これ長くなってる');
var_dump($node);
// Results
resource(5) of type (node)
###Using Nodes Nodes make it easy to access the information MeCab provides and give users powerful ways to navigate through results.
The node returned from the parseToNode() methods discussed in the previous section is the first node in the series and only represents the first morpheme. In order to get information about the entire string, it is necessary to walk through all the nodes in the series. But before we tackle that, lets take a quick look at some of more useful methods we have at our disposal.
- getPrev(): Get the previous node in the series.
- getNext(): Get the next node in the series.
- getSurface(): Get the surface (the original morpheme) of the node.
- getFeature(): Get the feature (the MeCab info) of the node.
- getLength(): Get the length of the node's surface.
- toArray(): Get all the node's elements as an associative array.
- mecab_node_prev(): Get the previous node in the series.
- mecab_node_next(): Get the next node in the series.
- mecab_node_surface(): Get the surface (the original morpheme) of the node.
- mecab_node_feature(): Get the feature (the MeCab info) of the node.
- mecab_node_length(): Get the length of the node's surface.
- mecab_node_toarray(): Get all the node's elements as an associative array.
There are several other methods available, but these are the most useful at this point. For a full list of methods, see the Classes and Functions section of this guide.
So let's see how we can walk through the nodes and extract the information we need.
You can go about this a couple ways. The first way simply walks through the nodes with a foreach loop.
$node = $mecab->parseToNode('カレーライスにしようかな');
foreach ($node as $n) {
echo $n->getFeature() . "\n";
}
// Results
BOS/EOS,*,*,*,*,*,*,*,*
名詞,一般,*,*,*,*,カレーライス,カレーライス,カレーライス
助詞,格助詞,一般,*,*,*,に,ニ,ニ
動詞,自立,*,*,サ変・スル,未然ウ接続,する,シヨ,シヨ
助動詞,*,*,*,不変化型,基本形,う,ウ,ウ
助詞,副助詞/並立助詞/終助詞,*,*,*,*,か,カ,カ
助詞,終助詞,*,*,*,*,な,ナ,ナ
BOS/EOS,*,*,*,*,*,*,*,*
This isn't necessairly a bad way to do it, but it's a little too magical for my liking. If $node is the first node in the series (and it is, you can var_dump and verify this), it doesn't make sense to loop through each $node as $n where $node is a single node and $n is also a single node. Instead, I prefer to use MeCab\Node's methods to explicitly define what I am doing.
$node = $mecab->parseToNode('これの方がいい');
do {
echo $node->getFeature() . "\n";
} while ($node = $node->getNext());
// Results
BOS/EOS,*,*,*,*,*,*,*,*
名詞,代名詞,一般,*,*,*,これ,コレ,コレ
助詞,連体化,*,*,*,*,の,ノ,ノ
名詞,非自立,一般,*,*,*,方,ホウ,ホー
助詞,格助詞,一般,*,*,*,が,ガ,ガ
形容詞,自立,*,*,形容詞・イイ,基本形,いい,イイ,イイ
BOS/EOS,*,*,*,*,*,*,*,*
We can extract the logic to a general purpose looping function.
function walkThroughNodes(\Mecab\Node $node, $callback)
{
do {
$callback($node);
} while ($node = $node->getNext());
}
We can then pass our walkThroughNodes function a closure to tell it what to do with each node.
$node = $mecab->parseToNode('これの方がいい');
walkThroughNodes($node, function($node) {
echo $node->getSurface() . "\n";
});
// Results
これ
の
方
が
いい
Now we have never have to worry about a basic walkthough again. We can simply pass our walkThroughNodes function a node and a callback.
As mentioned in the Object Orientated section above, we can simply walk through the nodes with a foreach loop, but I don't like that approach. Instead, lets use MeCab's nodes to our advantage.
$node = mecab_sparse_tonode($mecab, 'ビール飲みたい');
do {
echo mecab_node_surface($node) . "\n";
} while ($node = mecab_node_next($node));
// Results
ビール
飲み
たい
Like we did in the Object Orientated section, lets extract this to a function that we can send a callback to.
function walkThroughNodes($node, $callback)
{
do {
$callback($node);
} while ($node = mecab_node_next($node));
}
We can cuse our walkThroughNodes function like this.
$node = mecab_sparse_tonode($mecab, 'ビール飲みたい');
walkThroughNodes($node, function ($node) {
echo mecab_node_surface($node) . "\n";
});
// Results
ビール
飲み
たい
Now that we can extract information from Japanese strings using MeCab and php-mecab, let's take a quick look at what this information means.
$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);
$string = $mecab->parseToString('行く');
echo $string;
// Results
行く 動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク
EOS
Commonly in MeCab you will see BOS and EOS. These mean 'Beginning of Sentence' and 'End of Sentence', respectively.
In output lines, there are generally two parts, the surface and the feature. The surface is the original morpheme and the feature is MeCab info. In our case, '行く' is the surface and '動詞,自立,,,五段・カ行促音便,基本形,行く,イク,イク' is the feature. Remember you can use nodes to easily extract this information.
The feature is a comma seperated string with nine sections.
Section 1: Main part of speech category
Section 2: Part of speech sub-category
Section 3: Part of speech sub-category
Section 4: Part of speech sub-category
Section 5: Inflection type
Section 6: Inflection form
Section 7: Lemma (the root word found in the dictionary)
Section 8: Reading
Section 9: Pronunciation
In our example:
print_r(explode(',', '動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク'));
// Results
[0] => 動詞 // Main part of speech category
[1] => 自立 // Part of speech sub-category
[2] => * // Part of speech sub-category (none)
[3] => * // Part of speech sub-category (none)
[4] => 五段・カ行促音便 // Inflection type
[5] => 基本形 // Inflection form
[6] => 行く // Lemma (the root word found in the dictionary)
[7] => イク // Reading
[8] => イク // Pronunciation
What you do with this information is up to you!
Main class used to parse text.
- version()
- split()
- __construct()
- getPartial()
- setPartial()
- getTheta()
- setTheta()
- getLatticeLevel()
- setLatticeLevel()
- getAllMorphs()
- setAllMorphs()
- parse()
- parseToString()
- parseToNode()
- parseNBest()
- parseNBestInit()
- next()
- nextNode()
- formatNode()
- dictionaryInfo()
Return Mecab version.
/**
* @return string
*/
Only on versions prior to 0.6.0.
Split string into array of morphemes. Usually requires the dictionary directory to be passed as a parameter.
/**
* @param string $string String to split.
* @param string $dic_dir Path to dictionary directory. (Optional)
* @param string $user_dic Path to user dictionary. (Optional)
* @param callback $filter Filter function or method. (Optional)
* @param boolean $persistent (Optional)
*
* @return array
*/
Example
$mecab = new \Mecab_Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);
$array = $mecab::split('行きます', '/var/lib/mecab/dic/ipadic-utf8');
print_r($array);
Array
(
[0] => 行き
[1] => ます
)
Construct class instance.
/**
* @param array $arguments Command line arguments.
* @param boolean $persistent (Optional)
*
* @return MeCab\Tagger
*/
Get current partial parsing mode state.
/**
* @return boolean
*/
Set partial parsing mode.
/**
* @param boolean $bool Partial parsing mode.
*/
Get current temparature parameter theta.
/**
* @return float
*/
Set temparature parameter theta.
/**
* @param float/int $theta Temparature parameter theta.
*/
Get current lattice level.
/**
* @return int
*/
Set lattice level.
/**
* @param int $level Lattice level.
*/
Get all-morphs output mode.
/**
* @return bool
*/
Set all-morphs output mode.
/**
* @param bool $bool All-morphs output mode.
*/
Parse string and output results as string.
/**
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Example
$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);
$string = $mecab->parse('行きます');
print_r($string);
行き 動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ
ます 助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
EOS
Parse string and output results as string.
/**
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Example
$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);
$string = $mecab->parseToString('行きます');
print_r($string);
行き 動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ
ます 助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
EOS
Parse string and output results as MeCab/Node.
/**
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
*
* @return MeCab/Node
*/
Example
$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);
$node = $mecab->parseToNode('行きます');
print_r($node->toArray());
Array
(
[surface] =>
[feature] => BOS/EOS,*,*,*,*,*,*,*,*
[id] => 0
[length] => 0
[rlength] => 0
[rcAttr] => 0
[lcAttr] => 0
[posid] => 0
[char_type] => 0
[stat] => 2
[isbest] => 1
[alpha] => 0
[beta] => 0
[prob] => 0
[wcost] => 0
[cost] => 0
)
Parse given sentence and output N-best results as string. This method causes seg faults for me.
/**
* @param int $n Number of results to obtain.
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Initialize N-best enumeration with a sentence.
/**
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @return boolean
*/
Get the next result of N-Best as a string.
/**
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Get the next result of N-Best as a node.
/**
* @return MeCab\Node
*/
Format a node to a string.
/**
* @param MeCab\Node $node Node to be formatted.
*
* @return string
*/
Return array of dictionary info.
/**
* @return array
*/
Returned by parseToNode method on Mecab\Tagger.
- getIterator()
- setTraverse()
- getPrev()
- getNext()
- getENext()
- getBNext()
- getRPath()
- getLPath()
- getSurface()
- getFeature()
- getId()
- getLength()
- getRLength()
- getRcAttr()
- getLcAttr()
- getPosId()
- getCharType()
- getStat()
- getAlpha()
- getBeta()
- getWCost()
- getCost()
- getProb()
- isBest()
- toArray()
- toString()
Return MeCab\NodeIterator.
/**
* @return MeCab\NodeIterator
*/
Set the traverse mode.
/**
* @param long $mode Traverse mode.
*/
Get the previous node. Return NULL if none.
/**
* @return MeCab\Node
*/
Get the next node. Return NULL if none.
/**
* @return MeCab\Node
*/
Get the next node which has same end point as the given node. Return NULL if none.
/**
* @return MeCab\Node
*/
Get the next node which has same beginning point as the given node. Return NULL if none.
/**
* @return MeCab\Node
*/
Get the next node which has same end point as the given node. Return NULL if none.
/**
* @return MeCab\Path
*/
Get the next node which has same beginning point as the given node. Return NULL if none.
/**
* @return MeCab\Path
*/
Get the surface of the node.
/**
* @return string
*/
Get the feature of the node.
/**
* @return string
*/
Get the ID of the node.
/**
* @return int
*/
Get the length of the node's surface.
/**
* @return int
*/
Get the length of the node's surface including it's leading whitespace.
/**
* @return int
*/
Get the ID of the right context.
/**
* @return int
*/
Get the ID of the left context.
/**
* @return int
*/
Get the ID of the part of speech.
/**
* @return int
*/
Get the type of character.
/**
* @return int
*/
Get the status of the node.
/**
* @return int
*/
0: Normal, MECAB_NOR_NODE
1: Unknown, MECAB_UNK_NODE
2: Beginning of Sentence, MECAB_BOS_NODE
3: End of Sentence, MECAB_EOS_NODE
Get the forward log probability.
/**
* @return float
*/
Get the backward probability log.
/**
* @return float
*/
Get the word arising cost.
/**
* @return int
*/
Get the cumulative cost of the node.
/**
* @return int
*/
Get the marginal probability of the node.
/**
* @return float
*/
Determine whether the node is the best solution.
/**
* @return boolean
*/
Get all elements of the node as an associative array.
/**
* @param boolean $dump_all Dump all related nodes if true. (Optional)
*
* @return array
*/
Get the formatted string of the node.
/**
* @return string
*/
Returned by getRPath and getLPath methods on MeCab/Node class.
Get the rnext path. Return NULL if none.
/**
* @return MeCab/Path
*/
Get the lext path. Return NULL if none.
/**
* @return MeCab/Path
*/
Get the rnode. Return NULL if none.
/**
* @return MeCab/Node
*/
Get the lnode. Return NULL if none.
/**
* @return MeCab/Node
*/
Get the marginal probability of the path.
/**
* @return float
*/
Get the cumulative cost of the path.
/**
* @return int
*/
Node iterator class.
Return the current element.
/**
* @return MeCab\Node
*/
/**
* @return int
*/
Set pointer to next element.
Set pointer to beginning.
Check if there is a current element after calls to rewind() or next().
/**
* @return boolean
*/
- mecab_version()
- mecab_split()
- mecab_new()
- mecab_destroy()
- mecab_get_partial
- mecab_set_partial()
- mecab_get_theta()
- mecab_set_theta()
- mecab_get_lattice_level()
- mecab_set_lattice_level()
- mecab_get_all_morphs()
- mecab_set_all_morphs()
- mecab_sparse_tostr()
- mecab_sparse_tonode()
- mecab_nbest_sparse_tostr()
- mecab_nbest_init()
- mecab_nbest_next_tostr()
- mecab_nbest_next_tonode()
- mecab_format_node()
- mecab_dictionary_info()
- mecab_node_toarray()
- mecab_node_tostring()
- mecab_node_prev()
- mecab_node_next()
- mecab_node_enext()
- mecab_node_bnext()
- mecab_node_rpath()
- mecab_node_lpath()
- mecab_node_surface()
- mecab_node_feature()
- mecab_node_id()
- mecab_node_length()
- mecab_node_rlength()
- mecab_node_rcattr()
- mecab_node_lcattr()
- mecab_node_posid()
- mecab_node_char_type()
- mecab_node_stat()
- mecab_node_alpha()
- mecab_node_beta()
- mecab_node_wcost()
- mecab_node_cost()
- mecab_node_prob()
- mecab_node_isbest()
- mecab_path_rnext()
- mecab_path_lnext()
- mecab_path_rnode()
- mecab_path_lnode()
- mecab_path_prob()
- mecab_path_cost()
Return MeCab version. Return MeCab version.
/**
* @return string
*/
Split string into array of morphemes.
/**
* @param string $string String to split.
* @param string $dic_dir Path to dictionary directory. (Optional)
* @param string $user_dic Path to user dictionary. (Optional)
* @param callback $filter Filter function or method. (Optional)
* @param boolean $persistent (Optional)
*
* @return array
*/
Create new MeCab resource.
/**
* @param array $arguments Command line arguments.
* @param boolean $persistent (Optional)
*
* @return MeCab
*/
Free the tagger.
/**
* @param MeCab $mecab MeCab resource.
*/
Get current partial parsing mode state.
/**
* @param MeCab $mecab MeCab resource.
*
* @return boolean
*/
Set partial parsing mode.
/**
* @param MeCab $mecab MeCab resource.
* @param boolean $bool Partial parsing mode.
*/
Get current temparature parameter theta.
/**
* @param MeCab $mecab MeCab resource.
*
* @return float
*/
Set temparature parameter theta.
/**
* @param MeCab $mecab MeCab resource.
* @param float/int $theta Temparature parameter theta.
*/
Get current lattice level.
/**
* @param MeCab $mecab MeCab resource.
*
* @return int
*/
Set lattice level.
/**
* @param MeCab $mecab MeCab resource.
* @param int $level Lattice level.
*/
Get all-morphs output mode.
/**
* @param MeCab $mecab MeCab resource.
*
* @return bool
*/
Set all-morphs output mode.
/**
* @param MeCab $mecab MeCab resource.
* @param bool $bool All-morphs output mode.
*/
Parse string and output results as string.
/**
* @param MeCab $mecab MeCab resource.
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Parse string and output results as MeCab/Node.
/**
* @param MeCab $mecab MeCab resource.
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
*
* @return MeCab/Node
*/
Parse given sentence and output N-best results as string. This method causes seg faults for me.
/**
* @param MeCab $mecab MeCab resource.
* @param int $n Number of results to obtain.
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Initialize N-best enumeration with a sentence.
/**
* @param MeCab $mecab MeCab resource.
* @param string $string String to be parsed.
* @param int $length Length to be analyzed. (Optional)
* @return boolean
*/
Get the next result of N-Best as a string.
/**
* @param MeCab $mecab MeCab resource.
* @param int $output_length Maximum length of output. (Optional)
*
* @return string
*/
Get the next result of N-Best as a node.
/**
* @param MeCab $mecab MeCab resource.
*
* @return MeCab\Node
*/
Format a node to a string.
/**
* @param MeCab $mecab MeCab resource.
* @param MeCab\Node $node Node of source string.
*
* @return string
*/
Return array of dictionary info.
/**
* @return array
*/
Get all elements of the node as an associative array.
/**
* @param MeCab\Node $node Node of source string.
* @param boolean $dump_all Dump all related nodes if true. (Optional)
*
* @return array
*/
Get the formatted string of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return string
*/
Get the previous node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Node
*/
Get the next node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Node
*/
Get the next node which has same end point as the given node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Node
*/
Get the next node which has same beginning point as the given node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Node
*/
Get the next node which has same end point as the given node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Path
*/
Get the next node which has same beginning point as the given node. Return NULL if none.
/**
* @param MeCab\Node $node Node of source string.
*
* @return MeCab\Path
*/
Get the surface of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return string
*/
Get the feature of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return string
*/
Get the ID of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the length of the node's surface.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the length of the node's surface including it's leading whitespace.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the ID of the right context.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the ID of the left context.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the ID of the part of speech.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the type of character.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the status of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
0: Normal, MECAB_NOR_NODE
1: Unknown, MECAB_UNK_NODE
2: Beginning of Sentence, MECAB_BOS_NODE
3: End of Sentence, MECAB_EOS_NODE
Get the forward log probability.
/**
* @param MeCab\Node $node Node of source string.
*
* @return float
*/
Get the backward probability log.
/**
* @param MeCab\Node $node Node of source string.
*
* @return float
*/
Get the word arising cost.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the cumulative cost of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return int
*/
Get the marginal probability of the node.
/**
* @param MeCab\Node $node Node of source string.
*
* @return float
*/
Determine whether the node is the best solution.
/**
* @param MeCab\Node $node Node of source string.
*
* @return boolean
*/
Get the rnext path. Return NULL if none.
/**
* @param MeCab\Path $path Path of source string.
*
* @return MeCab\Path
*/
Get the lext path. Return NULL if none.
/**
* @param MeCab\Path $path Path of source string.
*
* @return MeCab\Path
*/
Get the rnode. Return NULL if none.
/**
* @param MeCab\Path $path Path of source string.
*
* @return MeCab\Node
*/
Get the lnode. Return NULL if none.
/**
* @param MeCab\Path $path Path of source string.
*
* @return MeCab\Node
*/
Get the marginal probability of the path.
/**
* @param MeCab\Path $path Path of source string.
*
* @return float
*/
Get the cumulative cost of the path.
/**
* @param MeCab\Path $path Path of source string.
*
* @return int
*/
The University of the Ryukyus Department of Mechanical Systems Engineering maintains a php-mecab API documentation page that can be useful.
http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html
The MeCab documentation is here on github, but its in Japanese only and is a little outdated.
http://taku910.github.io/mecab/
jordwest has translated parts the MeCab documentation into English here.
https://github.com/jordwest/mecab-docs-en
The MeCab api documentation is up on googlecode.
https://mecab.googlecode.com/svn/trunk/mecab/doc/doxygen/index.html
If you're using an IDE, fumikito has a gist that can help with php-mecab class recognition.
https://gist.github.com/fumikito/bb172b4cf5648c7f8451
If an app your using requires php-mecab and you'd like to use Travis CI, check out the example-travis.yml file and the accompanying travis-install-php.sh file in this repository.
Please help me to improve this guide. If you find errors or places where you feel this guide is lacking, please create an issue or make a pull request. Also, I would love to see this guide translated into other languages, especially Japanese. Any help with translations would be much appreciated.