This project implements a video recommendation system using the KuaiRec 2.0 dataset, which contains user-video interaction data from a short video platform. The goal was to create an effective recommendation system that considers both user engagement and content quality.
The KuaiRec 2.0 dataset includes:
- User-video interactions (views, watch duration)
- Video metadata (duration, features)
- Temporal information (timestamps)
- Social network data
Key metrics available:
- Watch ratio (play_duration / video_duration)
- Interaction timestamps
- Video features and categories
The first iteration of the vibe score system attempted to incorporate multiple factors:
# Initial weights
df["vibe_score"] = (
(df["quality_score"] * df["watch_ratio"] * df["time_weight"]) * 0.4 # Temporal and quality
+ (df["engagement_score"]) * 0.3 # Engagement
+ (df["content_relevance"]) * 0.2 # Content relevance
+ (df["social_score"]) * 0.1 # Social influence
)Components included:
- Time decay factor (recency of interactions)
- Engagement score (watch ratio)
- Content relevance (user preferences)
- Social influence (friend preferences)
- Quality scoring with multiple categories
Issues Identified:
- Too many components led to noisy scores
- Complex weighting system made results hard to interpret
- Social influence component added unnecessary complexity
- Quality scoring was too granular
The system was simplified to focus on three core components:
# Final weights
df["vibe_score"] = (
df["popularity_score"] * 0.5 # Popularity
+ df["watch_ratio_norm"] * 0.3 # Engagement
+ df["peak_bonus_norm"] * 0.2 # Temporal patterns
)Key improvements:
-
Popularity (50%)
- Based on number of views
- Normalized across all videos
- Provides a strong baseline for recommendations
-
Engagement (30%)
- Watch ratio based
- Log-scaled and normalized
- Better handles outliers
-
Peak Hours (20%)
- Identified peak hours: [0, 1, 20, 21, 22, 23]
- Binary bonus for peak hour views
- Captures temporal patterns without complexity
Benefits of Simplification:
- More stable and interpretable scores
- Better correlation with user engagement
- Easier to maintain and tune
- Clearer relationship between components
After testing multiple approaches (PySpark ALS, LightFM), Alternating Least Squares with Implicit feedback was chosen for several reasons:
-
Matrix Factorization Approach
- Effective for sparse interaction data
- Captures latent user-item relationships
- Scales well with dataset size
-
Confidence-Based Weighting
- Uses vibe scores as confidence weights
- Better handles implicit feedback
- More robust to noise
-
Technical Advantages
- GPU support for faster training
- Efficient memory usage
- Well-documented and maintained library
-
Implementation Benefits
- Simple to implement and maintain
- Good balance of performance and complexity
- Easy to tune and optimize
-
Interaction Patterns
- Most users have few interactions (long-tail distribution)
- Active users show consistent engagement
- Clear patterns in video duration preferences
-
Peak Usage Times
- Highest activity during late evening (20:00-23:00)
- Secondary peak in early morning (00:00-01:00)
- Lower activity during work hours
-
Engagement Metrics
- Average watch ratio: 0.8
- Most videos watched between 30% and 100%
- Strong correlation between watch ratio and vibe score
The final model achieved the following metrics:
- Precision@10: 0.3812
- Hit Rate@10: 0.9943
- NDCG@10: 0.3802
- MAP@10: 0.5110
-
Precision@10 (0.3812)
- Measures the proportion of relevant items in the top-10 recommendations
- A score of 0.3812 means that approximately 38% of recommended videos were relevant to users
- This is a good score for a video recommendation system, considering:
- The large number of possible videos
- The sparsity of user interactions
- The implicit nature of feedback
-
Hit Rate@10 (0.9943)
- Indicates the proportion of users who received at least one relevant recommendation
- The extremely high hit rate (99.43%) shows that the system successfully provides at least one relevant video to almost every user
- This is particularly important for user retention and engagement
- The high hit rate combined with moderate precision suggests the system is good at finding at least one relevant item but could improve in ranking
-
NDCG@10 (0.3802)
- Normalized Discounted Cumulative Gain considers the position of relevant items
- The score of 0.3802 indicates that:
- Relevant items are being found
- There's room for improvement in ranking order
- This metric is particularly useful for video recommendations as it:
- Rewards systems that place relevant items higher in the list
- Accounts for the fact that users are more likely to watch top-ranked videos
-
MAP@10 (0.5110)
- Mean Average Precision considers the order of relevant items
- The score of 0.5110 is higher than precision, indicating that:
- When the system finds relevant items, it tends to rank them well
- The model is better at ranking than simple precision suggests
- This is important for video recommendations as it:
- Reflects the quality of the entire ranking
- Considers the position of all relevant items
-
Why Not Recall@K?
- Recall was removed from the evaluation because:
- It measures the proportion of all relevant items that were recommended
- In video recommendations, the total number of relevant items is often unknown
- Users typically only watch a small subset of available videos
- The goal is to find the most relevant videos, not all possible relevant videos
- Recall was removed from the evaluation because:
-
Metric Selection Rationale
- Precision@10: Measures the immediate relevance of recommendations
- Hit Rate@10: Ensures users find at least one relevant video
- NDCG@10: Evaluates the ranking quality of recommendations
- MAP@10: Provides a comprehensive view of ranking performance
These results show:
- The system is highly effective at finding at least one relevant video for users
- There's a good balance between precision and ranking quality
- The model performs well in terms of user engagement
- The high hit rate suggests strong user satisfaction potential
-
Vibe Score Impact
- Simplified scoring led to 15% improvement in precision
- Better correlation with user engagement
- More stable recommendations
-
Temporal Patterns
- Peak hours significantly influence user engagement
- Time-based features improve recommendation quality
- Clear patterns in user activity
-
User Engagement
- Watch ratio is a strong predictor of user satisfaction
- Popularity provides a good baseline
- Combined metrics capture user preferences effectively
-
For the future
- The studied features, in behaviour for example, can be used to implement a better and more fine tuned Deep Neural Network model