TREC LTR Thesis

This repository contains the code for my thesis on Learning to Rank (LTR) with the TREC Deep Learning Track dataset. The project consists of a set of Jupyter notebooks and helper functions used to download, preprocess, index, and rank the TREC dataset using various LTR models in Apache Solr.

Folder Structure

The repository has the following folder structure:

/loggedFeatures: contains the feature vectors generated during the feature engineering process for each query-document pair. The feature vectors are stored as csv files.

/ltr: contains helper functions used throughout the project. The functions include data loading and config informations.

/solr: contains the scripts for interacting with Apache Solr.

/submissions: contains the final result files generated by the various LTR models and all the corresponding data that was used to create the models, including Apache Solr configsets, training and testing data and models in json format.

Notebooks

The Jupyter notebooks are numbered in the order in which they should be executed. Here is a brief explanation of each notebook:

1_dataDownloader.ipynb: downloads the TREC dataset and the relevance judgments.

2_preprocessing.py: cleans and preprocesses the TREC text dataset.

3_preprocessingJudgments.py: preprocesses the relevance judgments.

4_indexingDataInSolr.py: creates the index schema and indexes the preprocessed TREC dataset into Solr.

5_addStopWords.py: adds stop words to the Solr configuration to improve search quality.

6_featureEngineering.py: creates the feature store in Apache Solr.

7_featureLogging.py: logs the feature vectors to disk for later use.

8_RankSVM_min_max.ipynb: trains the RankSVM model using min-max normalization.

9_Ranknet_Keras_min_max.ipynb: trains the RankNet model using Keras and min-max normalization.

10_Evaluation_BM25.ipynb: evaluates the performance of the system using BM25 ranking.

11_Evaluation_svm.ipynb: evaluates the RankSVM model.

12_Evaluation_ranknet.ipynb: evaluates the RankNet model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TREC LTR Thesis

Folder Structure

Notebooks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
loggedFeatures		loggedFeatures
ltr		ltr
solr		solr
submission		submission
.gitignore		.gitignore
10_Evaluation_BM25.ipynb		10_Evaluation_BM25.ipynb
11_Evaluation_svm.ipynb		11_Evaluation_svm.ipynb
12_Evaluation_ranknet.ipynb		12_Evaluation_ranknet.ipynb
1_dataDownloader.ipynb		1_dataDownloader.ipynb
2_preprocessing.py		2_preprocessing.py
3_preprocessingJudgments.py		3_preprocessingJudgments.py
4_indexingDataInSolr.py		4_indexingDataInSolr.py
5_addStopWords.py		5_addStopWords.py
6_featureEngineering.py		6_featureEngineering.py
7_featureLogging.py		7_featureLogging.py
8_RankSVM_min_max.ipynb		8_RankSVM_min_max.ipynb
9_Ranknet_Keras_min_max.ipynb		9_Ranknet_Keras_min_max.ipynb
README.md		README.md
model_plot.png		model_plot.png

ThierryGirod/trec-ltr-thesis

Folders and files

Latest commit

History

Repository files navigation

TREC LTR Thesis

Folder Structure

Notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages