Execute the code

This repository includes the code to replicate the two submissions of the POLINKS Team.

The first one, called Optimized Constant achieve the sixth place in the final leaderboard. An overview of the model is represented in the following image:

Optimized Constant

This algorithm was created to optimize the result and to find the best constant for the two metrics of the challenge.

Install the requirements:

pip install -r Constant/requirements.txt

To run the algorithm just put the files val.tsv and competition_test.tsv in the same folder of the script and run:

python Constant/constant_solution.py --trainingpath mypath/training.tsv --validationpath mypath/val.tsv --testpath mypath/test.tsv

If you want to compute again the value of th CTR use the following option:

python Constant/constant_solution.py --computectr

Gradient Boosting

Due to the dataset dimension the code is divided in different steps, covering the generation and the training of the dataset.

Below are listed the steps to reproduce the solution.

Install the requirements:

pip install -r GradientBoosting/requirements.txt

Check if all the files (training.tsv, val.tsv, competition_test.tsv) are in the directory of the script or define the correct path with the parameters:
```
python GradientBoosting/full_solution.py --trainingpath mypath/training.tsv --validationpath mypath/val.tsv
```
Run the script that execute all the operations
```
python GradientBoosting/full_solution.py
```

Model overview

An overall idea of the model is outlined in the image below:

Logical steps

The script full_solution.py will perform the following logical steps:

Generate the .csv with the features about the engaging users
Generate the .csv with the features related to the author of the tweet
Generate the .csv considering all the languages spoken by each user in the dataset. A language is considered in this list if the user has previously interacted with a tweet written in that specific language.
Generate 4 .csv files, one for each type of engagement, that collects, for each user, all the authors it has engaged with.
Generate the training for the Gradient Boosting model: the full dataset, integrated with the features generated previously has to be written on disk because for the training is used the Xgboost external memory implementation, due to the large size of the dataset.
Train the four models and write them on file.
Generate the submission for the validation/test file

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
Constant		Constant
GradientBoosting		GradientBoosting
images		images
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Execute the code

Optimized Constant

Gradient Boosting

Model overview

Logical steps

About

Releases

Packages

Contributors 5

Languages

andreafiandro/recsys2020

Folders and files

Latest commit

History

Repository files navigation

Execute the code

Optimized Constant

Gradient Boosting

Model overview

Logical steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages