Time_Matters

Time matters is a extractor of relevant dates from text.

This Project has been developed by Jorge Mendes under the supervision of Professor Ricardo Campos in the scope of the Final Project of the Computer Science degree of the Polytechnic Institute of Tomar, Portugal.

The module are composed by:

Date extraction with py_heideltime / java heideltime.
Keyword extraction with YAKE.
Creation of a inverted index to organize the following data:
- Frequency the keyword or date that occur on text.
- how many sentences that keyword or date appears.
- offset of date and keyword.
Calculate the similarity of the relevant words with the canditate to relevant date.

Install Time_Matters

pip install git+https://github.com/JMendes1995/Time_Matters.git

Install External Dependencies

pip install git+https://github.com/LIAAD/yake

pip install git+https://github.com/JMendes1995/py_heideltime

You should also have java JDK and perl installed in your machine for heideltime dependencies.

Linux users

If your user does not have permission executions on python lib folder, you should execute the following command:
sudo chmod 111 /usr/local/lib/<YOUR PYTHON VERSION>/dist-packages/py_heideltime/HeidelTime/TreeTaggerLinux/bin/*

How to use Time_Matters

Python

from time_matters import timeMatters
text = '''
Albert Einstein published the theory of special relativity in 1905, building on many theoretical results and empirical findings obtained by Albert A. Michelson, Hendrik Lorentz, Henri Poincaré and others. Max Planck, Hermann Minkowski and others did subsequent work.
Einstein developed general relativity between 1907 and 1915, with contributions by many others after 1915. The final form of general relativity was published in 1916.
'''

Analyze all dates from entire text

# assuming default parameters
timeMatters(text)

# with all paramiters
timeMatters(text, contextual_window_distance=10, threshold=0.05, max_array_len=0, max_keywords=10, analisys_sentence=True, heideltime_document_type='news', heideltime_document_creation_time='')

print(output)

output

[{'Date': '1905', 'Score': 0.9980984799637649}, {'Date': '1907', 'Score': 0.9885848306283148}, {'Date': '1915', 'Score': 0.9467018487599099}, {'Date': '1916', 'Score': 0.8163265306122448}]

Analyze dates per text sentence

# assuming default parameters
timeMattersPerSentence(text)

# with all paramiters
timeMattersPerSentence(text, contextual_window_distance=10, threshold=0.05, max_array_len=0, max_keywords=10, heideltime_document_type='news', heideltime_document_creation_time='')

print(output)

output

[{'Sentence 1': {'Date': '1905', 'Score': 1.0}}, {'Sentence 3': {'Date': '1907', 'Score': 1.0}}, {'Sentence 3': {'Date': '1915', 'Score': 0.8908296943231436}}, {'Sentence 4': {'Date': '1916', 'Score': 1.0}}]

API

https://time-matters-api.herokuapp.com/

Python CLI - Command Line Interface

$ time_matters --help

Options:
  -t, --text TEXT                 insert text
  -dps, --date_per_sentence TEXT  select if want to analyze per sentence
  -cwd, --context_window_distance INTEGER
                                  max distance between words
  -th, --threshold FLOAT          minimum DICE threshold similarity values
  -n, --max_array_len INTEGER     size of the context vector
  -ky, --max_keywords INTEGER     max keywords
  -icwd, --ignore_contextual_window_distance TEXT
                                  ignore contextual window distance
  -aps, --analysis_sentence TEXT  DICE Calculation per sentence
  -td, --heideltime_document_type TEXT
                                  Type of the document specified by <file>
                                  (options: News, Narrative, Colloquial,
                                  Scientific).
  -dct, --heideltime_document_creation_time TEXT
                                  Creation date of document only valid format
                                  (YYYY-MM-DD).only will be considered if
                                  document type are News or colloquial.
  -i, --input_file TEXT           input text file
  --help                          Show this message and exit.

External modules used:

- YAKE
- numpy
- nltk
- Pandas
- regex
- py_heideltime/Heideltime

Please cite the following work when using Time-Matters:

Campos, R., Dias, G., Jorge, A. and Nunes, C. (2017). Identifying Top Relevant Dates for Implicit Time Sensitive Queries. In Information Retrieval Journal. Springer, Vol 20(4), pp 363-398

Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
time_matters		time_matters
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time_Matters

Install Time_Matters

Install External Dependencies

Linux users

How to use Time_Matters

Python

Analyze all dates from entire text

output

Analyze dates per text sentence

output

API

Python CLI - Command Line Interface

External modules used:

Please cite the following work when using Time-Matters:

About

Releases

Packages

Languages

rncampos/Time_Matters

Folders and files

Latest commit

History

Repository files navigation

Time_Matters

Install Time_Matters

Install External Dependencies

Linux users

How to use Time_Matters

Python

Analyze all dates from entire text

output

Analyze dates per text sentence

output

API

Python CLI - Command Line Interface

External modules used:

Please cite the following work when using Time-Matters:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages