Skip to content

GmanB3398/lightFM_quickstart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recommender Template

By Griffin Barich

This Project contains tools to get a LightFM recommender system up and running very quickly from a source dataset.

Dataset Creation and Training

Dataset creation is done using the example set up in the LightFM documentation. We fit a sparse matrix dataset with user-item interactions, adding columns and rows for these item and user features. This module then uses a cross-validated splitting function provided by LightFM to split the data randomly into a Training (60%), Validation (20%), and Testing (20%) Set. We also split into a first visit set and a return visit set.

We then run a hyperparameter optimization using a Grid Search based on the constants list in constants.py. These hyperparameters are used to find the model that optimizes the training set on the validation set for the precision at K metric (default: k=10). We then report the precision of the model trained on the training and validation set on the test set and train a model on the full data. You can pass in your own hyper-parameter grid as well.

How to Run

To run the Training pipeline, pip install the lightFM_quickstart module

pip install git+https://github.com/GmanB3398/lightFM_quickstart.git
from lightFM_quickstart.quickstart import lightfmModel, train_model_random_split
import pandas as pd

df = pd.read_csv('data/example_data.csv.gz')

hyper_param_grid = [{
    "no_components": [6, 12],
    "learning_schedule": ["adagrad", "adadelta"],
    "max_sampled": [5, 15],
    "item_alpha": [0, 1e-6],
    "epochs": [1, 10]
}]

model = train_model_random_split(df, 'user_id', 'order_id', 'item_id', 'order_date',
                                 user_attribute_cols=['device'], k=10,
                                 hyper_param_grid=hyper_param_grid)

results = model.predict_user(user_id)

Training takes a long time, so be sure to cache trained model in pickle files or similar.

Known Tech Debt

  • Add Split by time
  • Add Evaluation functions
  • The LightFM version does not support Python 3.9, requires an installation from github.

References

.gitignore from equinor's Data Science Template

About

Quickly startup a lightFM recommender system

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages