This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur (given the potential for frauds & cyber-attacks) I will still complete the platform.
Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers!
git clone https://github.com/gsunit/Food-Recommendation-System-Pyhton.git
- Run the Jupyter Notebook
src.ipynb
Suggesting the items that are well-received and popular among the users. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation.
import pandas as pd
import numpy as np
# Importing db of food items across all canteens registered on the platform
df1=pd.read_csv('./db/food.csv')
df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags']
df1
food_id | title | canteen_id | price | num_orders | category | avg_rating | num_rating | tags | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | Lala Maggi | 1 | 30 | 35 | maggi | 3.9 | 10 | veg, spicy |
1 | 2 | Cheese Maggi | 1 | 25 | 40 | maggi | 3.8 | 15 | veg |
2 | 3 | Masala Maggi | 1 | 25 | 10 | maggi | 3.0 | 10 | veg, spicy |
3 | 4 | Veg Maggi | 1 | 30 | 25 | maggi | 2.5 | 5 | veg, healthy |
4 | 5 | Paneer Tikka | 1 | 60 | 50 | Punjabi | 4.6 | 30 | veg, healthy |
5 | 6 | Chicken Tikka | 1 | 80 | 40 | Punjabi | 4.2 | 28 | nonveg, healthy, spicy |
top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head()
pop_items[['title', 'num_orders']].head()
title | num_rating | avg_rating | score | |
---|---|---|---|---|
4 | Paneer Tikka | 30 | 4.6 | 4.288889 |
5 | Chicken Tikka | 28 | 4.2 | 4.013953 |
1 | Cheese Maggi | 15 | 3.8 | 3.733333 |
title | num_orders | |
---|---|---|
4 | Paneer Tikka | 50 |
1 | Cheese Maggi | 40 |
5 | Chicken Tikka | 40 |
0 | Lala Maggi | 35 |
3 | Veg Maggi | 25 |
A bit more personalised recommendation. We will analyse the past orders of the user and suggest back those items which are similar.
Also, since each person has a "home canteen", the user should be notified of any new items included in the menu by the vendor.
We will be use Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together, we create a "soup" of tags. "Soup" is a processed string correspnding to each item, formed using the constituents of tags, tile and category.
food_id | title | canteen_id | price | num_orders | category | avg_rating | num_rating | tags | soup | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Lala Maggi | 1 | 30 | 35 | maggi | 3.9 | 10 | veg, spicy | veg spicy lala maggi |
1 | 2 | Cheese Maggi | 1 | 25 | 40 | maggi | 3.8 | 15 | veg | veg cheese maggi |
2 | 3 | Masala Maggi | 1 | 25 | 10 | maggi | 3.0 | 10 | veg, spicy | veg spicy masala maggi |
# Import CountVectorizer and create the count matrix
from sklearn.feature_extraction.text import CountVectorizer
count = CountVectorizer(stop_words='english')
# df1['soup']
count_matrix = count.fit_transform(df1['soup'])
# Compute the Cosine Similarity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(count_matrix, count_matrix)
df1.loc[get_recommendations(title="Paneer Tikka")]
food_id | title | canteen_id | price | num_orders | category | avg_rating | num_rating | tags | soup | |
---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | Chicken Tikka | 1 | 80 | 40 | Punjabi | 4.2 | 28 | nonveg, healthy, spicy | nonveg healthy spicy chicken tikka punjabi |
3 | 4 | Veg Maggi | 1 | 30 | 25 | maggi | 2.5 | 5 | veg, healthy | veg healthy maggi |
personalised_recomms(orders, df1, current_user, columns)
get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns)
get_top_rated_items(top_rated_items, df1, columns)
get_popular_items(pop_items, df1, columns).head(3)
title | canteen_id | price | comment | |
---|---|---|---|---|
0 | Veg Maggi | 1 | 30 | based on your past orders |
1 | Paneer Tikka | 1 | 60 | based on your past orders |
2 | Chicken Tikka | 1 | 80 | based on your past orders |
title | canteen_id | price | comment | |
---|---|---|---|---|
0 | Cheese Maggi | 1 | 25 | new/today's special item in your home canteen |
title | canteen_id | price | comment | |
---|---|---|---|---|
0 | Paneer Tikka | 1 | 60 | top rated items across canteens |
1 | Chicken Tikka | 1 | 80 | top rated items across canteens |
2 | Cheese Maggi | 1 | 25 | top rated items across canteens |
title | canteen_id | price | comment | |
---|---|---|---|---|
0 | Paneer Tikka | 1 | 60 | most popular items across canteens |
1 | Cheese Maggi | 1 | 25 | most popular items across canteens |
2 | Chicken Tikka | 1 | 80 | most popular items across canteens |
These are just simple algorithms to make personalised & general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first, then move on to other features!