-
-
Notifications
You must be signed in to change notification settings - Fork 271
event chunks created #1715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event chunks created #1715
Conversation
Summary by CodeRabbit
Summary by CodeRabbit
WalkthroughA new Django management command for creating event-based text chunks and embeddings was added, along with a Makefile target to invoke it. The chunk and embedding creation logic was centralized into a shared utility function, and existing chapter and Slack message chunking commands were refactored to use this utility, simplifying their implementations. Changes
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes found. Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (5)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
backend/apps/ai/Makefile(1 hunks)backend/apps/ai/management/commands/ai_create_event_chunks.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
backend/apps/ai/management/commands/ai_create_event_chunks.py
168-168: No newline at end of file
Add trailing newline
(W292)
🪛 Flake8 (7.2.0)
backend/apps/ai/management/commands/ai_create_event_chunks.py
[error] 64-64: whitespace before ':'
(E203)
[warning] 168-168: no newline at end of file
(W292)
🪛 Pylint (3.3.7)
backend/apps/ai/management/commands/ai_create_event_chunks.py
[convention] 168-168: Final newline missing
(C0304)
[error] 7-7: Unable to import 'openai'
(E0401)
[error] 8-8: Unable to import 'django.core.management.base'
(E0401)
[error] 10-14: Unable to import 'apps.ai.common.constants'
(E0401)
[error] 15-15: Unable to import 'apps.ai.models.chunk'
(E0401)
[error] 16-16: Unable to import 'apps.owasp.models.event'
(E0401)
[convention] 19-19: Missing class docstring
(C0115)
[convention] 22-22: Missing function or method docstring
(C0116)
[convention] 40-40: Missing function or method docstring
(C0116)
[warning] 40-40: Unused argument 'args'
(W0613)
[warning] 47-47: Attribute 'openai_client' defined outside init
(W0201)
[warning] 106-106: Attribute 'last_request_time' defined outside init
(W0201)
🔇 Additional comments (2)
backend/apps/ai/management/commands/ai_create_event_chunks.py (2)
1-39: Well-structured command setup with clear arguments.The imports are organized properly and the command arguments provide good flexibility for processing specific events, all events, or upcoming events by default.
86-86: Split_text implementation verified
- The
split_textmethod is defined inbackend/apps/ai/models/chunk.pyand returns alist[str]viaRecursiveCharacterTextSplitter.- Calling it as
Chunk.split_text(prose_content)is valid and aligns with its signature.No changes required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arkid15r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure CI/CD is green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
backend/apps/ai/management/commands/ai_create_event_chunks.py (2)
59-59: Fix whitespace formatting issue.Remove the extra space before the colon in the slice notation to follow Python style guidelines.
- batch_events = queryset[offset : offset + batch_size] + batch_events = queryset[offset:offset + batch_size]
134-134: Add missing newline at end of file.Add a trailing newline to comply with standard file formatting conventions.
return ( DELIMITER.join(filter(None, prose_parts)), DELIMITER.join(filter(None, metadata_parts)), ) +
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
Makefile(1 hunks)backend/apps/ai/common/create_chunks_and_embeddings.py(1 hunks)backend/apps/ai/management/commands/ai_create_chapter_chunks.py(3 hunks)backend/apps/ai/management/commands/ai_create_event_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_create_slack_message_chunks.py(3 hunks)backend/apps/github/constants.py(1 hunks)backend/apps/github/models/mixins/user.py(2 hunks)backend/apps/github/models/user.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/apps/ai/common/create_chunks_and_embeddings.py (1)
backend/apps/ai/models/chunk.py (1)
Chunk(13-88)
backend/apps/github/models/mixins/user.py (3)
backend/apps/github/models/user.py (1)
get_non_indexable_logins(95-107)backend/apps/github/models/repository_contributor.py (1)
RepositoryContributor(14-207)backend/apps/github/graphql/queries/user.py (1)
user(40-56)
🪛 Ruff (0.11.9)
backend/apps/ai/common/create_chunks_and_embeddings.py
1-1: Missing docstring in public module
(D100)
15-25: Multi-line docstring summary should start at the first line
Remove whitespace after opening quotes
(D212)
15-25: First line of docstring should be in imperative mood: "Creates chunks and embeddings from given texts using OpenAI embeddings."
(D401)
23-23: Missing blank line after last section ("Returns")
Add blank line after "Returns"
(D413)
27-27: Line too long (102 > 99)
(E501)
54-54: Do not catch blind exception: Exception
(BLE001)
55-55: print found
Remove print
(T201)
🔇 Additional comments (19)
backend/apps/github/constants.py (1)
5-5: LGTM! Consistent constant addition.The new
GITHUB_ACTIONS_USER_LOGINconstant follows the established naming pattern and serves the same purpose as other GitHub user constants in filtering out automated users from indexing.Makefile (1)
7-7: Good addition for cleaner build output.The
--no-print-directoryflag suppresses directory change messages, which improves the readability of make command output. This is particularly beneficial when running nested make commands or targets that involve multiple directory changes.backend/apps/github/models/user.py (2)
8-12: LGTM! Consistent import update.The import statement correctly includes the new
GITHUB_ACTIONS_USER_LOGINconstant, maintaining consistency with the existing import pattern.
103-103: LGTM! Proper exclusion of GitHub Actions users.Adding
GITHUB_ACTIONS_USER_LOGINto the set of non-indexable logins is consistent with the existing pattern of excluding automated/system users from indexing. This ensures GitHub Actions users are properly filtered out alongside ghost users and foundation accounts.backend/apps/github/models/mixins/user.py (2)
14-18: Good enhancement to bot filtering.The additional check for login suffixes "Bot" and "-bot" is a smart improvement that catches bot accounts that might not be flagged by the
is_botfield. This provides more comprehensive filtering of automated accounts from indexing.
105-108: LGTM! Proper focus on OWASP-related content.Filtering contributions by
is_owasp_related_organization=Truealigns with the project's focus on OWASP-related content. This ensures that user contribution data in the index is relevant to the project's scope.backend/apps/ai/management/commands/ai_create_chapter_chunks.py (4)
8-9: LGTM! Good refactoring to use shared helper function.The import of the shared
create_chunks_and_embeddingsfunction properly centralizes the embedding generation logic, reducing code duplication across commands.
63-63: LGTM! Consistent method naming.The method name change from
create_chunkstohandle_chunksbetter reflects its role in processing chunks rather than just creating them.
71-71: LGTM! Clean method signature.The method signature remains unchanged except for the name, maintaining backward compatibility in the interface.
87-91: LGTM! Simplified implementation using shared helper.The delegation to
create_chunks_and_embeddingsproperly removes the inline embedding logic, making the code cleaner and more maintainable.backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (4)
8-8: LGTM! Consistent refactoring pattern.The import of the shared helper function follows the same pattern as the chapter chunks command, promoting consistency across the codebase.
34-34: LGTM! Consistent method naming.The method name change to
handle_chunksaligns with the naming convention used in other refactored commands.
40-40: LGTM! Clean method signature.The method signature maintains consistency with the chapter chunks implementation.
52-56: LGTM! Simplified implementation using shared helper.The delegation to
create_chunks_and_embeddingsremoves duplicate code and centralizes the embedding logic, improving maintainability.backend/apps/ai/management/commands/ai_create_event_chunks.py (5)
1-12: LGTM! Consistent structure and imports.The file follows the established pattern of other chunk creation commands with appropriate imports and structure.
17-33: LGTM! Well-structured command arguments.The argument parser provides good flexibility with options for single event processing, all events, or default filtering, plus configurable batch size.
35-50: LGTM! Proper API key handling and event filtering.The OpenAI API key validation and event queryset logic are implemented correctly, with appropriate defaults to
Event.upcoming_events().
87-91: LGTM! Consistent use of shared helper function.The delegation to
create_chunks_and_embeddingsfollows the same pattern as the refactored chapter and Slack message commands, promoting code consistency.
93-133: LGTM! Comprehensive event content extraction.The
extract_event_contentmethod properly separates prose content from metadata and handles all relevant event fields appropriately.
arkid15r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem you run make check locally. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
backend/apps/ai/management/commands/ai_create_event_chunks.py (2)
59-59: Fix whitespace before colon in slice notation.- batch_events = queryset[offset : offset + batch_size] + batch_events = queryset[offset:offset + batch_size]
133-133: Add missing newline at end of file.return ( DELIMITER.join(filter(None, prose_parts)), DELIMITER.join(filter(None, metadata_parts)), ) +
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
backend/apps/ai/common/utils.py(1 hunks)backend/apps/ai/management/commands/ai_create_chapter_chunks.py(3 hunks)backend/apps/ai/management/commands/ai_create_event_chunks.py(1 hunks)backend/apps/ai/management/commands/ai_create_slack_message_chunks.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/apps/ai/common/utils.py (1)
backend/apps/ai/models/chunk.py (1)
Chunk(13-88)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (5)
backend/apps/ai/common/utils.py (1)
create_chunks_and_embeddings(16-64)backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)
handle_chunks(71-91)backend/apps/ai/management/commands/ai_create_event_chunks.py (1)
handle_chunks(71-91)backend/apps/slack/models/message.py (1)
Message(15-160)backend/apps/ai/models/chunk.py (1)
Chunk(13-88)
🔇 Additional comments (7)
backend/apps/ai/common/utils.py (1)
46-61: LGTM: Clean implementation of chunk creation logic.The chunk creation logic properly handles the embedding response and uses the walrus operator effectively in the list comprehension. The use of
strict=Truein the zip function provides good protection against mismatched data lengths.backend/apps/ai/management/commands/ai_create_chapter_chunks.py (2)
8-9: LGTM: Clean import of the new utility function.The import is properly structured and follows the project's import conventions.
87-91: LGTM: Excellent refactoring to use the centralized utility.The refactoring successfully delegates embedding creation to the utility function, removing duplicated code and improving maintainability. The parameter naming is consistent with the utility function signature.
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (2)
8-8: LGTM: Proper import of the centralized utility.The import follows the established pattern from the chapter chunks command.
52-56: LGTM: Consistent refactoring with other commands.The refactoring maintains consistency with the chapter chunks command and properly delegates embedding creation to the utility function.
backend/apps/ai/management/commands/ai_create_event_chunks.py (2)
87-91: LGTM: Consistent implementation using the centralized utility.The event chunk creation properly uses the new utility function and follows the same pattern established in the other commands. The parameter passing is correct and consistent.
93-133: LGTM: Comprehensive event content extraction.The event content extraction method properly separates prose content from metadata and handles various event fields appropriately. The use of
get_category_display()ensures proper category formatting.
|



Resolves #1683