-
-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Add recency preference and genre diversification to recommendation logic #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary of ChangesHello @TimilsinaBimal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive overhaul of the recommendation system, focusing on delivering more personalized, diverse, and high-quality content. Key changes include dynamic recency adjustments based on user profiles, advanced genre diversification techniques, and a shift to a more interpretable overlap-based similarity scoring. Additionally, mechanisms for injecting fresh, trending content and refining metadata processing have been implemented to enhance the overall user experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant enhancements to the recommendation logic, including a preference for recent content and diversification of genres in the results. The changes are extensive, especially in recommendation_service.py, where the ranking and candidate selection pipeline has been completely overhauled. New features like 'topics' based on text tokenization, freshness injection, and sophisticated diversification strategies have been added.
My review focuses on improving code maintainability by addressing significant code duplication and removing dead code. Several logical blocks, such as freshness injection and genre whitelist creation, are repeated across multiple methods and should be refactored into helpers. There are also several functions and methods that appear to be unused after the refactoring and should be removed. Overall, the new recommendation logic is much more advanced, and these changes will help keep the codebase clean and manageable.
| # Build per-user top-genre whitelist | ||
| try: | ||
| top_gen_pairs = user_profile.get_top_genres(limit=TOP_GENRE_WHITELIST_LIMIT) | ||
| top_genre_whitelist: set[int] = {int(gid) for gid, _ in top_gen_pairs} | ||
| except Exception: | ||
| top_genre_whitelist = set() | ||
|
|
||
| def _passes_top_genre(item_genre_ids: list[int] | None) -> bool: | ||
| if not top_genre_whitelist: | ||
| return True | ||
| gids = set(item_genre_ids or []) | ||
| if not gids: | ||
| return True | ||
| if 16 in gids and 16 not in top_genre_whitelist: | ||
| return False | ||
| return bool(gids & top_genre_whitelist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for building a user profile to create a top_genre_whitelist and the _passes_top_genre helper function is duplicated across get_recommendations, get_recommendations_for_item, and get_recommendations_for_theme. This should be refactored into one or more private helper methods to improve maintainability and reduce code duplication.
| # Freshness injection: trending/highly rated items to broaden taste | ||
| try: | ||
| fresh_added = 0 | ||
| from collections import defaultdict | ||
|
|
||
| fresh_genre_counts = defaultdict(int) | ||
| cap_injection = max(1, int(max_results * PER_GENRE_MAX_SHARE)) | ||
| mtype = "tv" if content_type in ("tv", "series") else "movie" | ||
| trending_resp = await self.tmdb_service.get_trending(mtype, time_window="week") | ||
| trending = trending_resp.get("results", []) if trending_resp else [] | ||
| # Mix in top-rated | ||
| top_rated_resp = await self.tmdb_service.get_top_rated(mtype) | ||
| top_rated = top_rated_resp.get("results", []) if top_rated_resp else [] | ||
| fresh_pool = [] | ||
| fresh_pool.extend(trending[:40]) | ||
| fresh_pool.extend(top_rated[:40]) | ||
| # Filter by excluded genres and quality threshold | ||
| for it in fresh_pool: | ||
| tid = it.get("id") | ||
| if not tid or tid in candidate_pool: | ||
| continue | ||
| # Exclude already watched by TMDB id | ||
| if tid in watched_tmdb_ids: | ||
| continue | ||
| # Excluded genres | ||
| gids = it.get("genre_ids") or [] | ||
| if excluded_ids and excluded_ids.intersection(set(gids)): | ||
| continue | ||
| # Respect top-genre whitelist | ||
| if not _passes_top_genre(gids): | ||
| continue | ||
| # Quality: prefer strong audience signal | ||
| va = float(it.get("vote_average") or 0.0) | ||
| vc = int(it.get("vote_count") or 0) | ||
| if vc < 300 or va < 7.0: | ||
| continue | ||
| # Genre diversity inside freshness injection | ||
| if gids and any(fresh_genre_counts[g] >= cap_injection for g in gids): | ||
| continue | ||
| # Mark as freshness candidate | ||
| it["_fresh_boost"] = True | ||
| candidate_pool[tid] = it | ||
| for g in gids: | ||
| fresh_genre_counts[g] += 1 | ||
| fresh_added += 1 | ||
| if fresh_added >= max_results * 2: | ||
| break | ||
| if fresh_added: | ||
| logger.info(f"Freshness injection added {fresh_added} trending/top-rated candidates") | ||
| except Exception as e: | ||
| logger.warning(f"Freshness injection failed: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 'Freshness injection' logic is duplicated in three places in this file (get_recommendations, get_recommendations_for_item, get_recommendations_for_theme). This significant code duplication makes the code harder to maintain and prone to inconsistencies. Please refactor this into a single, reusable private method.
| @staticmethod | ||
| def _normalize(value: float, min_v: float = 0.0, max_v: float = 10.0) -> float: | ||
| if max_v == min_v: | ||
| return 0.0 | ||
| return max(0.0, min(1.0, (value - min_v) / (max_v - min_v))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @staticmethod | ||
| def _recency_multiplier(year: int | None) -> float: | ||
| """Prefer recent titles. Softly dampen very old titles.""" | ||
| if not year: | ||
| return 1.0 | ||
| try: | ||
| y = int(year) | ||
| except Exception: | ||
| return 1.0 | ||
| if y >= 2021: | ||
| return 1.12 | ||
| if y >= 2015: | ||
| return 1.06 | ||
| if y >= 2010: | ||
| return 1.00 | ||
| if y >= 2000: | ||
| return 0.92 | ||
| if y >= 1990: | ||
| return 0.82 | ||
| return 0.70 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fresh_pool = [] | ||
| fresh_pool.extend(trending[:40]) | ||
| fresh_pool.extend(top_rated[:40]) | ||
| from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def calculate_similarity(self, profile: UserTasteProfile, item_meta: dict) -> float: | ||
| """ | ||
| Final improved similarity scoring function. | ||
| Uses normalized sparse matching + rarity boosting + non-linear emphasis. | ||
| Simplified similarity: linear weighted sum across core dimensions. | ||
| """ | ||
| item_vec = self._vectorize_item(item_meta) | ||
|
|
||
| # Linear weighted sum across selected dimensions | ||
| # For each dimension we average per-feature match to avoid bias from many features | ||
| def avg_pref(features, mapping): | ||
| if not features: | ||
| return 0.0 | ||
| s = 0.0 | ||
| for f in features: | ||
| s += mapping.get(f, 0.0) | ||
| return s / max(1, len(features)) | ||
|
|
||
| g_score = avg_pref(item_vec.get("genres", []), profile.genres.values) * GENRES_WEIGHT | ||
| k_score = avg_pref(item_vec.get("keywords", []), profile.keywords.values) * KEYWORDS_WEIGHT | ||
| c_score = avg_pref(item_vec.get("cast", []), profile.cast.values) * CAST_WEIGHT | ||
| t_score = avg_pref(item_vec.get("topics", []), profile.topics.values) * TOPICS_WEIGHT | ||
|
|
||
| # Optional extras with small weights | ||
| crew_score = avg_pref(item_vec.get("crew", []), profile.crew.values) * CREW_WEIGHT | ||
| country_score = avg_pref(item_vec.get("countries", []), profile.countries.values) * COUNTRIES_WEIGHT | ||
| year_val = item_vec.get("year") | ||
| year_score = 0.0 | ||
| if year_val is not None: | ||
| year_score = profile.years.values.get(year_val, 0.0) * YEAR_WEIGHT | ||
|
|
||
| score = g_score + k_score + c_score + t_score + crew_score + country_score + year_score | ||
|
|
||
| return float(score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…with call counting
…ion logic (#44) * feat: Enhance TMDBService with trending and top-rated content retrieval * feat: Add recency preference and genre diversification to recommendation logic * feat: Refactor TMDBService usage to support language preference across services * feat: Refactor library item fetching and caching for improved performance and consistency * feat: Implement caching for language retrieval and refactor auth key encryption * feat: Add middleware for Redis call tracking and enhance token store with call counting * chore: bump version to v1.1.0
* feat: better bare row name generation * fix: get started button is not functional * fix: get started button is not functional * chore: bump version to 1.0.1 (#38) * feat: Add recency preference and genre diversification to recommendation logic (#44) * feat: Enhance TMDBService with trending and top-rated content retrieval * feat: Add recency preference and genre diversification to recommendation logic * feat: Refactor TMDBService usage to support language preference across services * feat: Refactor library item fetching and caching for improved performance and consistency * feat: Implement caching for language retrieval and refactor auth key encryption * feat: Add middleware for Redis call tracking and enhance token store with call counting * chore: bump version to v1.1.0 * feat: Enhance Redis client management and implement rate limiting for token requests * feat: Refactor catalog fetching and update migration task handling * chore: bump version to v1.1.2 * style: new logo and logo themed ui changes * style: format files * fix: invalidate cache on delete/store (#50)
* fix: get started button is not functional * fix: get started button is not functional * chore: bump version to 1.0.1 (#38) * feat: Add recency preference and genre diversification to recommendation logic (#44) * feat: Enhance TMDBService with trending and top-rated content retrieval * feat: Add recency preference and genre diversification to recommendation logic * feat: Refactor TMDBService usage to support language preference across services * feat: Refactor library item fetching and caching for improved performance and consistency * feat: Implement caching for language retrieval and refactor auth key encryption * feat: Add middleware for Redis call tracking and enhance token store with call counting * chore: bump version to v1.1.0 * feat: Enhance Redis client management and implement rate limiting for token requests * feat: Refactor catalog fetching and update migration task handling * chore: bump version to v1.1.2 * feat: add option to configure top_posters for poster rating (#94) * feat: add option to configure top_posters for poster rating * feat: implement poster rating API key validation and frontend integration * feat: add display_at_home and shuffle options to Stremio identity fetch * refactor: rename get_poster methods to get_poster_url for clarity and improve error handling in API key validation * chore: bump version to v1.6.2-rc.1 * feat: add User-Agent header to API requests in TopPostersService * feat: update PosterRatingConfig to use Literal for provider type validation * refactor: rename get_poster method to get_poster_url for consistency in PosterRatingsFactory * refactor: update constants for top picks and adjust scoring logic to enhance recommendation quality * fix: priotrize reacted items when preparing library even if they are not watched * chore: bump version to v1.6.2-rc.2 * chore: bump version to v1.7.0
No description provided.