Skip to content

melinazik/InformationRetrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval

In this project Information Retrieval techniques were applied on speeches of the Greek Parliament for the University Course “Information Retrieval”. The dataset that was used contains 1.280.918 speeches, is 2.3 GB and the speeches are from 1989 till 2020. The dataset can be found on this Github repository https://github.com/iMEdD-Lab/Greek_Parliament_Proceedings.

The goal of this project was the organization and the process of the data to extract useful information out of those speeches.

More specifically this project focused on:

a) Keywords extraction. Keywords can be found for each speech, for each politician, and for each politician party and how those change every year. When top-15 keywords were searched (and how they change every year) from 378.000 speeches, keywords were found after 98 sec. Also, when top-15 keywords were searched from 23.569 speeches, they were found after 32 sec.

b) Top-k politician’s pairs that have the highest similarity on their speeches can be found (TF-IDF method). The average execution time that is needed in order for top-k pairs to be found is 65 sec.

c) Top-k concepts of all the speeches can be found (using Latest Semantic Analysis method). The average execution time of this process is 250 sec. A web-based applicaton was created so that the extraction of these information can be much more easily accesible and user friendly.

About

University Project on Information Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages