Skip to content

resolved #10 #11 custom matching patterns and Gensim model issues

Compare
Choose a tag to compare
@davidberenstein1957 davidberenstein1957 released this 24 Sep 12:14
· 39 commits to main since this release

#11 added support for more custom matching patterns via 4 config variables.

  • ´exclude_pos´: A list of POS tags to be excluded from the rule based match.
  • ´exclude_dep´: A list of dependencies to be excluded from the rule based match.
  • ´include_compound_words´: If True, it will include compound words in the entity. For example, if the entity is "New York", it will also include "New York City" as an entity.
  • ´case_sensitive´: Whether to match the case of the words in the text.

#10 resolved an issue where gensim Word2Vec and FastText models were not processed as KeyedVectors. Hence, the model did not load due to mis'interpretting it as an iterable object.

Also unified code regarding checking whether string are present in a model.

Also made sure that n-grams models and word matches are supported.