Python-recommender-101

Going off from Kaggle's recommendation system tutorial. This is an overview of an actual python recommendation system based on news sharing and user interactions

Recommender Techniques

3 types of recommender techniques were built and tested:

Content based filtering
Collaborative filtering
Hybrid approach

Baseline

Basedline was created using popularity recommender but this doesn't have any personalization

Recommender Evaluation

The common recommender evaluation criteria includes:

Top N metrics
NDCG@N
MAP@N

Cross Validations

The tutorial covers the simple 80/20 split between train and test. However, to model what the recommender would produce in production, the timestamp based split should be used to model what the recommender would spit out on a particular date

Details of the content based filtering

Build User profiles
Recommend based on User/Item profiles

Details of the collaborative filtering

Memory based: using past interaction activity, compute items that are similar based on users interacted or compute users that are similar based on items they have interacted
Model based: (SVD, deep recommenders, reinforcement learnings)

Techniques that I need to double click to understand further

Top-N accuracy scores
How exactly the content based filtering work (How user profiles are created, item profiles are created, how recommendation is created) . Look at the
How does the TF-IDF technique work for information retrival. It is noted in the tutorial that it is for transforming unstructured data into vectorized form
How does the Collaborative filtering work? Look in more details at what the implementation looks like.
How exactly the hybrid approach combined the content based filtering and collaborative filtering together?
Check out how the vector space model work and why it matters to the content based filters

Technical details and questions:

What's the scipy sparse matrix and vectorizer function? What do those do? In the code, this was used during the computation of the tfidf score based on the item details prior to the creation of item profile
What does the item profile exactly look like? It is at the per item level where all the words within the item are computed with a TFIDF score.
What does the TFIDFVectorizer function do? What does the output look like? Where is the corpus coming from for the IDF calculation?
What does the item profile and user profile look like?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python-recommender-101

Recommender Techniques

Baseline

Recommender Evaluation

Cross Validations

Details of the content based filtering

Details of the collaborative filtering

Techniques that I need to double click to understand further

Technical details and questions:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python-recommender-101

Recommender Techniques

Baseline

Recommender Evaluation

Cross Validations

Details of the content based filtering

Details of the collaborative filtering

Techniques that I need to double click to understand further

Technical details and questions: