-
Notifications
You must be signed in to change notification settings - Fork 132
[feat]: add user profile summarizing and generation of embeddings #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce advanced user profile summarization and semantic search capabilities using vector embeddings and LLM-generated summaries. The Weaviate database integration is enhanced with new methods for vector-based and keyword-based contributor search, profile retrieval, and explicit vectorization configuration. The Supabase vector DB service is removed, and the embedding service now orchestrates profile summarization, embedding, and similarity search. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ProfilingService as User Profiling
participant EmbeddingService
participant WeaviateDB
User->>ProfilingService: profile_user_from_github(user_id, github_username)
ProfilingService->>ProfilingService: build_user_profile(...)
ProfilingService->>EmbeddingService: process_user_profile(profile)
EmbeddingService->>EmbeddingService: summarize_user_profile(profile)
EmbeddingService->>EmbeddingService: get_embedding(summary)
EmbeddingService-->>ProfilingService: (profile, embedding_vector)
ProfilingService->>WeaviateDB: store_user_profile(profile, embedding_vector)
WeaviateDB-->>ProfilingService: success/failure
ProfilingService-->>User: True/False
sequenceDiagram
participant User
participant EmbeddingService
participant WeaviateDB
User->>EmbeddingService: search_similar_profiles(query_text)
EmbeddingService->>EmbeddingService: get_embedding(query_text)
EmbeddingService->>WeaviateDB: search_similar_contributors(query_embedding)
WeaviateDB-->>EmbeddingService: List of similar profiles
EmbeddingService-->>User: List of similar profiles
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected. Suggested labels
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings🔇 Additional comments (16)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
backend/app/services/embedding_service/service.py (2)
190-191: Consider moving import to module level.While importing inside the method avoids circular imports, it's better practice to handle this at the module level with proper import organization.
Move the import to the top of the file:
+from app.database.weaviate.operations import search_similar_contributors from app.models.database.weaviate import WeaviateUserProfileThen remove the import from line 190-191.
222-223: Move gc import to module level.Standard practice is to import modules at the top of the file.
Move
import gcto the top of the file with other imports.backend/app/database/weaviate/operations.py (1)
216-217: Valid TODO: Document the limitation clearly.The comment correctly identifies that Weaviate's built-in hybrid search doesn't support custom vectors. Consider creating an issue to track this enhancement.
Would you like me to create an issue to track the implementation of a custom hybrid search solution?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/database/weaviate/__init__.py(1 hunks)backend/app/database/weaviate/operations.py(9 hunks)backend/app/database/weaviate/scripts/create_schemas.py(1 hunks)backend/app/services/embedding_service/profile_summarization/prompts/summarization_prompt.py(1 hunks)backend/app/services/embedding_service/service.py(4 hunks)backend/app/services/user/profiling.py(3 hunks)backend/app/services/vector_db/service.py(0 hunks)
💤 Files with no reviewable changes (1)
- backend/app/services/vector_db/service.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
🧬 Code Graph Analysis (3)
backend/app/database/weaviate/__init__.py (2)
backend/app/database/weaviate/operations.py (8)
store_user_profile(281-286)search_similar_contributors(115-164)search_similar_contributors(288-293)search_contributors_by_keywords(166-214)search_contributors_by_keywords(295-300)get_contributor_profile(218-260)get_contributor_profile(302-305)WeaviateUserOperations(13-278)backend/app/database/weaviate/client.py (1)
get_weaviate_client(19-32)
backend/app/services/user/profiling.py (2)
backend/app/services/embedding_service/service.py (3)
EmbeddingService(29-226)process_user_profile(159-176)clear_cache(213-226)backend/app/database/weaviate/operations.py (1)
store_user_profile(281-286)
backend/app/services/embedding_service/service.py (1)
backend/app/models/database/weaviate.py (1)
WeaviateUserProfile(32-129)
🔇 Additional comments (9)
backend/app/database/weaviate/scripts/create_schemas.py (1)
9-9: LGTM!Disabling automatic vectorization is the correct approach since embeddings are generated externally by the embedding service.
backend/app/database/weaviate/__init__.py (1)
1-18: Well-structured package initialization.The exports provide a clean public API for the Weaviate operations module.
backend/app/services/embedding_service/profile_summarization/prompts/summarization_prompt.py (1)
1-24: Well-crafted prompt template for profile summarization.The prompt provides clear instructions for generating keyword-rich summaries optimized for semantic search and contributor matching. The structure and guidelines are comprehensive.
backend/app/services/user/profiling.py (2)
9-9: Correct import for embedding service integration.
303-326: Excellent error handling and resource management.The implementation correctly:
- Checks for None profile before processing
- Handles embedding service exceptions
- Clears the embedding service cache in the finally block to prevent memory leaks
backend/app/services/embedding_service/service.py (1)
54-68: Good choice of temperature for consistent summarization.Using temperature=0.3 for the LLM ensures relatively consistent and deterministic profile summaries, which is appropriate for this use case.
backend/app/database/weaviate/operations.py (3)
30-30: Good fix: Corrected parameter name to match Weaviate API.The change from
wheretofiltersaligns with Weaviate's query API requirements.
166-215: Well-implemented keyword search functionality.The BM25 search implementation is correct with proper error handling and result formatting.
47-47: All embedding_vector callers are up to date—no further changes needed.I’ve verified that:
- There are no external calls to
create_user_profile,update_user_profile, orupsert_user_profileoutside ofoperations.pyitself.- The only convenience entry point,
store_user_profileinbackend/app/services/user/profiling.py, already passes the newembedding_vectorargument.- The test helper
update_user_profileintests/test_weaviate.pyis a locally defined function and does not reference the updated method signature.No action is required on existing callers.
closes #67
closes #77
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Enhancements
Removals