Skip to content

Examples

eaxelson edited this page Aug 30, 2017 · 5 revisions

Examples of using HFST package

Simple start

After installing HFST on your computer, start python. For example, the following simple program

import hfst
tr1 = hfst.regex('foo:bar')
tr2 = hfst.regex('bar:baz')
tr1.compose(tr2)
print(tr1)

should print to standard output the following text when run:

0      1     foo    baz    0
1      0

Download and compile a lexicon

Download a Finnish lexicon text file from:

http://hfst.github.io/downloads/finntreebank.lexc

start python and execute:

import hfst
hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE)
tr = hfst.compile_lexc_file('finntreebank.lexc')
tr.invert()
tr.convert(hfst.ImplementationType.HFST_OL_TYPE)

You may also download some precompiled lexicons for various languages from

https://sourceforge.net/projects/hfst/files/resources/morphological-transducers/

You can try out the Finnish lexicon with some words:

import sys
for line in sys.stdin:
    print(tr.lookup(line.replace('\n',''), output='text'))

Try the word "testi" and you should get something like:

testi<N><sg><nom> 0.000000

Try a non-word "xtesti" and you get something like:

0

Creating and testing transducers

An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:

import hfst
# Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
t = hfst.HfstBasicTransducer()
t.add_state(1)
t.add_transition(0, 1, 'a', 'b', 0.3)
t.set_final_weight(1, 0.5)

# Convert to tropical OpenFst format (the default) and push weights toward final state.
T = hfst.HfstTransducer(t)
T.push_weights_to_end()

# Convert back to HFST basic transducer.
tc = hfst.HfstBasicTransducer(T)
try:
# Rounding might affect the precision.
    if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
        print("TEST PASSED")
        exit(0)
    else:
        print("TEST FAILED")
        exit(1)
# If the state does not exist or is not final
except hfst.exceptions.HfstException as e:
    print("TEST FAILED: An exception was thrown.")
    exit(1)

Applying rules and printing strings

An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.

import hfst
hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE) # we use foma implementation as there are no weights involved

# Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
tok = hfst.HfstTokenizer()
tok.add_multichar_symbol('foo')
tok.add_multichar_symbol('bar')
tok.add_multichar_symbol('baz')

words = hfst.tokenized_fst(tok.tokenize('foobarfoo'))
t = hfst.tokenized_fst(tok.tokenize('foobarbaz'))
words.disjunct(t)

# Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
rule = hfst.regex('bar (->) baz || foo _ foo')

# Apply the rule transducer to the lexicon.
words.compose(rule)
words.minimize()

# Extract all string pairs from the result and print them to standard output.
results = 0
try:
# Extract paths and remove tokenization
    results = words.extract_paths(output='dict')
except hfst.exceptions.TransducerIsCyclicException as e:
# This should not happen because transducer is not cyclic.
    print("TEST FAILED")
    exit(1)

for input,outputs in results.items():
    print('%s:' % input)
    for output in outputs:
        print('  %s\t%f' % (output[0], output[1]))

The output:

foobarfoo:
  foobarfoo     0.000000
  foobazfoo     0.000000
foobarbaz:
  foobarbaz     0.000000

More information

The help command of python is probably useful when finding information on the package, a class in it or a given function in a class:

help(hfst)
help(hfst.HfstTransducer)
help(hfst.HfstTransducer.lookup)