IR_SearchEngineWithElasticSearch

This is the final project of the Information Retrieval course.
It uses ElasticSearch as a backend and all data is referred to Hamshahri Corpus.
ATTENTION! Due to the protection of property rights of the corpus, this repository uploaded only 7 docs for the sample.

Professor: Dr. Saeed Rahmani
February 2022

About the project

The project will search the documents in the corpus and return the results.
It works with Persian and Arabic languages because the corpus is in Persian.
Currently, it does not have any GUI, but a simple command line interface is provided.

Before start, you need to install ElasticSearch and the dependencies.
1- Run setup.py to create the index.
2- Head to the index.py to index the docs and insert them to elasticsearch.
3- Run the main.py to start the search engine.

The project structure flow

The program will first connect to the ElasticSearch server and create a new index.
It will ask you a query, and it will search the index for the query.
It will suggest you to complete the query.
The given query will be parsed and the words will be tokenized.
The words will be corrected if they are not correct in spelling.
The TF-IDF algorithm will be applied to the words.
The documents that contain the words will be ranked according to their TF-IDF scores.

Index mapping table

Data	Type	Index	Analyzer	Similarity
DOCID	text	False	None	None
CAT	text	True	Persian	None
TITLE	text	True	Persian	`text_similarity`
TEXT	text	True	Persian	None

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
HAM2-070506.xml		HAM2-070506.xml
HAM2-070507.xml		HAM2-070507.xml
HAM2-070508.xml		HAM2-070508.xml
HAM2-070509.xml		HAM2-070509.xml
HAM2-070510.xml		HAM2-070510.xml
HAM2-070512.xml		HAM2-070512.xml
HAM2-070513.xml		HAM2-070513.xml
README.md		README.md
index.py		index.py
main.py		main.py
output.py		output.py
setup.py		setup.py
statistics.py		statistics.py
stop_words.txt		stop_words.txt
text_autocomplete.py		text_autocomplete.py
text_processing.py		text_processing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IR_SearchEngineWithElasticSearch

This is the final project of the Information Retrieval course.
It uses ElasticSearch as a backend and all data is referred to Hamshahri Corpus.
ATTENTION! Due to the protection of property rights of the corpus, this repository uploaded only 7 docs for the sample.

About the project

The project structure flow

Index mapping table

About

Releases

Packages

Languages

jestemAria/IR_SearchEngineWithElasticSearch

Folders and files

Latest commit

History

Repository files navigation

IR_SearchEngineWithElasticSearch

This is the final project of the Information Retrieval course. It uses ElasticSearch as a backend and all data is referred to Hamshahri Corpus. ATTENTION! Due to the protection of property rights of the corpus, this repository uploaded only 7 docs for the sample.

About the project

The project structure flow

Index mapping table

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

This is the final project of the Information Retrieval course.
It uses ElasticSearch as a backend and all data is referred to Hamshahri Corpus.
ATTENTION! Due to the protection of property rights of the corpus, this repository uploaded only 7 docs for the sample.

Packages