Skip to content

Latest commit

 

History

History
48 lines (35 loc) · 2.49 KB

README.md

File metadata and controls

48 lines (35 loc) · 2.49 KB

Python-recommender-101

Going off from Kaggle's recommendation system tutorial. This is an overview of an actual python recommendation system based on news sharing and user interactions

Recommender Techniques

3 types of recommender techniques were built and tested:

  1. Content based filtering
  2. Collaborative filtering
  3. Hybrid approach

Baseline

Basedline was created using popularity recommender but this doesn't have any personalization

Recommender Evaluation

The common recommender evaluation criteria includes:

  1. Top N metrics
  2. NDCG@N
  3. MAP@N

Cross Validations

The tutorial covers the simple 80/20 split between train and test. However, to model what the recommender would produce in production, the timestamp based split should be used to model what the recommender would spit out on a particular date

Details of the content based filtering

  1. Build User profiles
  2. Recommend based on User/Item profiles

Details of the collaborative filtering

  1. Memory based: using past interaction activity, compute items that are similar based on users interacted or compute users that are similar based on items they have interacted
  2. Model based: (SVD, deep recommenders, reinforcement learnings)

Techniques that I need to double click to understand further

  1. Top-N accuracy scores
  2. How exactly the content based filtering work (How user profiles are created, item profiles are created, how recommendation is created) . Look at the
  3. How does the TF-IDF technique work for information retrival. It is noted in the tutorial that it is for transforming unstructured data into vectorized form
  4. How does the Collaborative filtering work? Look in more details at what the implementation looks like.
  5. How exactly the hybrid approach combined the content based filtering and collaborative filtering together?
  6. Check out how the vector space model work and why it matters to the content based filters

Technical details and questions:

  1. What's the scipy sparse matrix and vectorizer function? What do those do? In the code, this was used during the computation of the tfidf score based on the item details prior to the creation of item profile
  2. What does the item profile exactly look like? It is at the per item level where all the words within the item are computed with a TFIDF score.
  3. What does the TFIDFVectorizer function do? What does the output look like? Where is the corpus coming from for the IDF calculation?
  4. What does the item profile and user profile look like?