Information Retrieval

In this project Information Retrieval techniques were applied on speeches of the Greek Parliament for the University Course “Information Retrieval”. The dataset that was used contains 1.280.918 speeches, is 2.3 GB and the speeches are from 1989 till 2020. The dataset can be found on this Github repository https://github.com/iMEdD-Lab/Greek_Parliament_Proceedings.

The goal of this project was the organization and the process of the data to extract useful information out of those speeches.

More specifically this project focused on:

a) Keywords extraction. Keywords can be found for each speech, for each politician, and for each politician party and how those change every year. When top-15 keywords were searched (and how they change every year) from 378.000 speeches, keywords were found after 98 sec. Also, when top-15 keywords were searched from 23.569 speeches, they were found after 32 sec.

b) Top-k politician’s pairs that have the highest similarity on their speeches can be found (TF-IDF method). The average execution time that is needed in order for top-k pairs to be found is 65 sec.

c) Top-k concepts of all the speeches can be found (using Latest Semantic Analysis method). The average execution time of this process is 250 sec. A web-based applicaton was created so that the extraction of these information can be much more easily accesible and user friendly.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
code		code
data		data
pictures		pictures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval

About

Releases

Packages

Contributors 2

Languages

melinazik/InformationRetrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages