-
Notifications
You must be signed in to change notification settings - Fork 1
Examples
After installing HFST on your computer, start python. For example, the following simple program
import hfst
tr1 = hfst.regex('foo:bar')
tr2 = hfst.regex('bar:baz')
tr1.compose(tr2)
print(tr1)
should print to standard output the following text when run:
0 1 foo baz 0
1 0
Download a Finnish lexicon text file from:
http://hfst.github.io/downloads/finntreebank.lexc
start python and execute:
import hfst
hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE)
tr = hfst.compile_lexc_file('finntreebank.lexc')
tr.invert()
tr.convert(hfst.ImplementationType.HFST_OL_TYPE)
You may also download some precompiled lexicons for various languages from
https://sourceforge.net/projects/hfst/files/resources/morphological-transducers/
You can try out the Finnish lexicon with some words:
import sys
for line in sys.stdin:
print(tr.lookup(line.replace('\n',''), output='text'))
Try the word "testi" and you should get something like:
testi<N><sg><nom> 0.000000
Try a non-word "xtesti" and you get something like:
0
An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:
import hfst
# Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
t = hfst.HfstBasicTransducer()
t.add_state(1)
t.add_transition(0, 1, 'a', 'b', 0.3)
t.set_final_weight(1, 0.5)
# Convert to tropical OpenFst format (the default) and push weights toward final state.
T = hfst.HfstTransducer(t)
T.push_weights_to_end()
# Convert back to HFST basic transducer.
tc = hfst.HfstBasicTransducer(T)
try:
# Rounding might affect the precision.
if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
print("TEST PASSED")
exit(0)
else:
print("TEST FAILED")
exit(1)
# If the state does not exist or is not final
except hfst.exceptions.HfstException as e:
print("TEST FAILED: An exception was thrown.")
exit(1)
An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.
import hfst
hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved
# Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
tok = hfst.HfstTokenizer()
tok.add_multichar_symbol('foo')
tok.add_multichar_symbol('bar')
tok.add_multichar_symbol('baz')
words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))
t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))
words.disjunct(t)
# Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
rule = hfst.regex('bar (->) baz || foo _ foo')
# Apply the rule transducer to the lexicon.
words.compose(rule)
words.minimize()
# Extract all string pairs from the result and print them to standard output.
results = 0
try:
# Extract paths and remove tokenization
results = words.extract_paths(output='dict')
except hfst.exceptions.TransducerIsCyclicException as e:
# This should not happen because transducer is not cyclic.
print("TEST FAILED")
exit(1)
for input,outputs in results.items():
print('%s:' % input)
for output in outputs:
print(' %s\t%f' % (output[0], output[1]))
The output:
foobarfoo:
foobarfoo 0.000000
foobazfoo 0.000000
foobarbaz:
foobarbaz 0.000000
The help
command of python is probably useful when finding information on the package, a class in it or a given function in a class:
help(hfst)
help(hfst.HfstTransducer)
help(hfst.HfstTransducer.lookup)
Package hfst
- AttReader
- PrologReader
- HfstBasicTransducer
- HfstBasicTransition
- HfstTransducer
- HfstInputStream
- HfstOutputStream
- MultiCharSymbolTrie
- HfstTokenizer
- LexcCompiler
- XreCompiler
- PmatchContainer
- ImplementationType