Skip to content

rncampos/py_heideltime

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

py_heideltime

py_heideltime is a python wrapper for the multilingual temporal tagger HeidelTime.

For more information about this temporal tagger, please visit the Heideltime Java standalone version: https://github.com/HeidelTime/heideltime

This wrapper has been developed by Jorge Mendes under the supervision of Professor Ricardo Campos in the scope of the Final Project of the Computer Science degree at the Polytechnic Institute of Tomar, Portugal.

Although there already exist some python models for Heideltime (in particular https://github.com/amineabdaoui/python-heideltime) all of them require a considerable intervention from the user side. In this project, we aim to overcome some of these limitations. Our aim was four-fold:

  • To provide a multi-platform (windows, Linux, Mac Os);
  • To make it user friendly not only in terms of installation but also in its usage;
  • To make it lightweight without compromising its behavior;
  • To give the possibility to choose the granularity of extracted dates.

How to install py_heideltime

In order to use py_heideltime you must have java JDK and perl installed in your machine for heideltime dependencies.

pip install git+https://github.com/JMendes1995/py_heideltime.git
Linux users
If your user does not have permission executions on python lib folder, you should execute the following command:
sudo chmod 111 /usr/local/lib/<YOUR PYTHON VERSION>/dist-packages/py_heideltime/HeidelTime/TreeTaggerLinux/bin/*

How to use py_heideltime

from py_heideltime import heideltime

text = '''
Thurs August 31st - News today that they are beginning to evacuate the London children tomorrow. Percy is a billeting officer. I can't see that they will be much safer here.
'''

With default parameters.

heideltime(text, language='English')
Output
[('XXXX-08-31', 'August 31st'), ('PRESENT_REF', 'today'), ('XXXX-XX-XX', 'tomorrow')]

With all the parameters.

heideltime(text, language='English', document_type='news', document_creation_time='1939-08-31')
Output
[('1939-08-31', 'August 31st'), ('1939-08-31', 'today'), ('1939-09-01', 'tomorrow')] 

Python CLI - Command Line Interface

py_heideltime --help

  Usage_examples: py_heideltime -t "August 31st" -l "English" or
  py_heideltime -t "August 31st" -l "English" -td "News" -dct "1939-08-31"

Options:
  -t, --text TEXT                 insert text, text should be surrounded by
                                  quotes “” (e.g., “Thurs August 31st”)
  -l, --language TEXT             [required] Language text is required and
                                  should be surrounded by quotes “”. Options:
                                  English, Portuguese, Spanish, Germany,
                                  Dutch, Italian, French (e.g., “English”).
                                  [required]
  -dg, --date_granularity TEXT    Value of granularity should be surrounded by
                                  quotes “”. Options: Year, Month, day (e.g.,
                                  “Year”).
  -dt, --document_type TEXT       Type of the document text should be
                                  surrounded by quotes “”. Options: “News” :
                                  news-style documents; “Narrative” :
                                  narrative-style documents (e.g., Wikipedia
                                  articles); “Colloquial” : English colloquial
                                  (e.g., Tweets and SMS);  “Scientific” :
                                  scientific articles (e.g., clinical trails)
  -dct, --document_creation_time TEXT
                                  Document creation date in the format YYYY-
                                  MM-DD should be surrounded by quotes (e.g.,
                                  “2019-05-30”). Note that this date will only
                                  be taken into account when News or
                                  Colloquial texts are specified.
  -i, --input_file TEXT           text path should be surrounded by quotes
                                  (e.g., “text.txt”)
  --help                          Show this message and exit.

Supported languages

This module is prepared to work with the following languages: English, Portuguese, Spanish, Germany, Dutch, Italian, French.

To use py_heideltime with other languages proceed as follows:

  • Download from TreeTagger the parameter files
  • gunzip < Downloaded file >
  • Copy the extracted file to the module folder /py_heideltime/HeidelTime/TreeTagger< your system >/lib/

Publications

If you use HeidelTime (either through this package or another one) please cite the appropriate paper. In general, this would be:

Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex

Other related papers may be found here:

https://github.com/HeidelTime/heideltime#Publications

Please check Time-Matters if you are interested in detecting the relevance (score) of dates in a text.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 75.1%
  • Shell 14.0%
  • Python 10.9%