PmatchContainer

class PmatchContainer

class PmatchContainer

A class for performing pattern matching.

Probably the easiest way to perform pattern matching is with functions hfst.compile_pmatch_expression and hfst.compile_pmatch_file

init (self)

Initialize a PmatchContainer. Is this needed?

init (self, defs)

Create a PmatchContainer based on definitions defs.

defs: A tuple of transducers in HFST_OLW_TYPE defining how pmatch is done.

An example:

If we have a file named streets.txt that contains:

define CapWord UppercaseAlpha Alpha* ;
define StreetWordFr [{avenue} | {boulevard} | {rue}] ;
define DeFr [ [{de} | {du} | {des} | {de la}] Whitespace ] | [{d'} | {l'}] ;
define StreetFr StreetWordFr (Whitespace DeFr) CapWord+ ;
regex StreetFr EndTag(FrenchStreetName) ;

and which has been earlier compiled and stored in file streets.pmatch.hfst.ol:

defs = hfst.compile_pmatch_file('streets.txt')
ostr = hfst.HfstOutputStream(filename='streets.pmatch.hfst.ol', type=hfst.ImplementationType.HFST_OLW_TYPE)
for tr in defs:
    ostr.write(tr)
ostr.close()

we can read the pmatch definitions from file and perform string matching with:

istr = hfst.HfstInputStream('streets.pmatch.hfst.ol')
defs = []
while(not istr.is_eof()):
    defs.append(istr.read())
istr.close()
cont = hfst.PmatchContainer(defs)
assert cont.match("Je marche seul dans l'avenue des Ternes.") == "Je marche seul dans l'<FrenchStreetName>avenue des Ternes</FrenchStreetName>."

See also: hfst.compile_pmatch_file, hfst.compile_pmatch_expression

match (self, input, time_cutoff = 0)

Match input input.

get_profiling_info (self)

todo

set_verbose (self, b)

todo

set_extract_tags_mode (self, b)

todo

set_profile (self, b)

todo

tokenize(self, input)

Tokenize input and return a list of tokens i.e. strings.

input: The string to be tokenized.

get_tokenized_output(self, input, **kwargs)

Tokenize input and get a string representation of the tokenization (essentially the same that command line tool hfst-tokenize would give).

input: The input string to be tokenized.
kwargs: Possible parameters are: output_format, max_weight_classes, dedupe, print_weights, print_all, time_cutoff, verbose, beam, tokenize_multichar.
output_format: The format of output; possible values are 'tokenize', 'xerox', 'cg', 'finnpos', 'giellacg', 'conllu' and 'visl'; 'tokenize' being the default.
max_weight_classes: Maximum number of best weight classes to output (where analyses with equal weight constitute a class), defaults to None i.e. no limit.
dedupe: Whether duplicate analyses are removed, defaults to False.
print_weights: Whether weights are printd, defaults to False.
print_all: Whether nonmatching text is printed, defaults to False.
time_cutoff: Maximum number of seconds used per input after limiting the search.
verbose: Whether input is processed verbosely, defaults to True.
beam: Beam within analyses must be to get printed.
tokenize_multichar: Tokenize input into multicharacter symbols present in the transducer, defaults to false.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PmatchContainer

class PmatchContainer

init (self)

init (self, defs)

match (self, input, time_cutoff = 0)

get_profiling_info (self)

set_verbose (self, b)

set_extract_tags_mode (self, b)

set_profile (self, b)

tokenize(self, input)

get_tokenized_output(self, input, **kwargs)

Pages

Package hfst

Package hfst.exceptions

Package hfst.sfst_rules

Package hfst.xerox_rules

General information

Links

Clone this wiki locally

PmatchContainer

class PmatchContainer

__init__ (self)

__init__ (self, defs)

match (self, input, time_cutoff = 0)

get_profiling_info (self)

set_verbose (self, b)

set_extract_tags_mode (self, b)

set_profile (self, b)

tokenize(self, input)

get_tokenized_output(self, input, **kwargs)

Pages

Package hfst

Package hfst.exceptions

Package hfst.sfst_rules

Package hfst.xerox_rules

General information

Links

Clone this wiki locally

init (self)

init (self, defs)