-
Notifications
You must be signed in to change notification settings - Fork 39
fix(backend): few improvements for scaling branches #7752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request updates schema management, git models, tasks, utils, and tests. Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CodSpeed Performance ReportMerging #7752 will not alter performanceComparing Summary
|
4769e44 to
3c19703
Compare
3c19703 to
b87cf09
Compare
This would not allow branch creation to scale since we purge inactive branches on each create, thus processing each branch. 10x create branch speedup at 100 branches (1.5s vs 15s). Signed-off-by: Fatih Acar <fatih@opsmill.com>
The SDK get_list_repositories doesn't scale. A workaround is to use the get_repositories_commit_per_branch helper that is similar but using DB queries to get the data. Signed-off-by: Fatih Acar <fatih@opsmill.com>
In the recurring Sync Git Repositories task, only sync branches that are flagged with the sync_with_git flag. Signed-off-by: Fatih Acar <fatih@opsmill.com>
This reverts commit 23ab221. Not sure of the impact of this change (could break features related to repositories). Also, this change is not required for real world usage (syncing a lot of branches between two sync jobs).
b87cf09 to
0f6e2db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/infrahub/git/tasks.py (1)
207-207: Type annotation assumesCoreRepositorybut union type is broader.The type annotation
repository: CoreRepository = repository_data.repositoryassumes all results areCoreRepository, butrepository_data.repositoryis typed asCoreRepository | CoreReadOnlyRepository | Node.Since
kind=InfrahubKind.REPOSITORYis passed toget_repositories_commit_per_branch, this assumption is likely safe in practice. However, for type safety, consider usingcast()or adding a runtime assertion:from typing import cast repository = cast(CoreRepository, repository_data.repository)Or validate explicitly:
repository = repository_data.repository assert hasattr(repository, 'default_branch'), f"Expected CoreRepository, got {type(repository)}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/infrahub/core/schema/manager.py(1 hunks)backend/infrahub/git/models.py(2 hunks)backend/infrahub/git/tasks.py(5 hunks)backend/infrahub/git/utils.py(3 hunks)backend/tests/unit/git/test_utils.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Always use triple quotes (""") for Python docstrings
Follow Google-style docstring format for Python docstrings
Include brief one-line description in Python docstrings when applicable
Include detailed description in Python docstrings when applicable
Include Args/Parameters section without typing in Python docstrings when applicable
Include Returns section in Python docstrings when applicable
Include Raises section in Python docstrings when applicable
Include Examples section in Python docstrings when applicable
**/*.py: Use type hints for all function parameters and return values in Python
Use Async whenever possible in Python
Useasync deffor asynchronous functions in Python
Useawaitfor asynchronous calls in Python
Use Pydantic models for dataclasses in Python
Use ruff and mypy for type checking and code validation in PythonUse ruff and mypy to validate and lint Python files
Files:
backend/infrahub/git/models.pybackend/infrahub/git/tasks.pybackend/infrahub/git/utils.pybackend/tests/unit/git/test_utils.pybackend/infrahub/core/schema/manager.py
backend/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use type hints for Python code in backend
Files:
backend/infrahub/git/models.pybackend/infrahub/git/tasks.pybackend/infrahub/git/utils.pybackend/tests/unit/git/test_utils.pybackend/infrahub/core/schema/manager.py
backend/infrahub/**/*.py
📄 CodeRabbit inference engine (backend/AGENTS.md)
backend/infrahub/**/*.py: Use async/await for all I/O operations to maintain async-first architecture
Type hint all function parameters and returns in Python code
Use Pydantic models for defining data structures instead of plain dictionaries
Use Query class pattern (extending infrahub.core.query.Query) for all database operations instead of unparameterized Cypher queries
Use Google-style docstrings with Args, Returns, and Raises sections for all functions
Use snake_case for function and variable names
Use PascalCase for class names
Use UPPER_SNAKE_CASE for constant definitions
Do not use unparameterized Cypher queries; always use parameterized queries to prevent injection
Do not block the event loop with synchronous I/O operations
Files:
backend/infrahub/git/models.pybackend/infrahub/git/tasks.pybackend/infrahub/git/utils.pybackend/infrahub/core/schema/manager.py
backend/tests/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasksName test files as
test_<module>.pyand mirror source structure in tests directory
Files:
backend/tests/unit/git/test_utils.py
🧠 Learnings (1)
📚 Learning: 2025-12-01T22:16:13.668Z
Learnt from: CR
Repo: opsmill/infrahub PR: 0
File: backend/AGENTS.md:0-0
Timestamp: 2025-12-01T22:16:13.668Z
Learning: Applies to backend/infrahub/**/*.py : Use Pydantic models for defining data structures instead of plain dictionaries
Applied to files:
backend/infrahub/git/models.py
🧬 Code graph analysis (3)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (1)
get_repositories_commit_per_branch(27-64)backend/infrahub/workers/dependencies.py (2)
get_database(70-71)get_client(50-51)
backend/infrahub/git/utils.py (2)
backend/infrahub/core/manager.py (1)
NodeManager(78-1416)backend/infrahub/git/models.py (2)
RepositoryBranchInfo(201-202)RepositoryData(205-224)
backend/infrahub/core/schema/manager.py (2)
backend/infrahub/core/schema/schema_branch.py (3)
get(324-357)get_all(449-456)duplicate(292-301)backend/infrahub/core/models.py (2)
nodes(62-64)duplicate(501-503)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (10)
backend/infrahub/core/schema/manager.py (1)
777-785: Optimization to avoid redundant schema hash collection looks correct.The logic correctly skips branches that share an already-processed schema hash, reducing redundant work when multiple branches point to the same schema. The conditional on line 780 properly handles both cases: when a branch has no hash yet (new/untracked) and when the hash hasn't been seen before.
One edge case to verify: if
active_branchis in_branch_hash_by_namebut not in_branches, line 781-782 would add the hash tobranch_processed, but line 783'sget()would returnNone, skipping hash collection. This seems intentional (stale entry cleanup), but worth confirming this scenario is expected.backend/tests/unit/git/test_utils.py (2)
34-47: Test assertions correctly adapted for the newrepositoryfield.The two-step verification pattern (identity check on
repository.id, thenmodel_dump(exclude=["repository"])for remaining fields) is the right approach for testing a Pydantic model containing a non-serializable object reference.
66-99: LGTM!The multi-branch test correctly verifies repository identity and field values across different branches with modified commits.
backend/infrahub/git/models.py (2)
206-212: Model configuration and repository field look good.The
arbitrary_types_allowed=Trueis necessary sinceNodeis not a Pydantic-native type. The union type properly accommodates both repository protocol types and the genericNodefallback.
220-224: Consider edge case: multiple branches with staging status.The
get_staging_branch()method returns the first branch withinternal_status == "staging". If multiple staging branches are possible (even temporarily), this would return an arbitrary one based on dict iteration order.If only one staging branch is ever valid, this is fine. Otherwise, consider returning a list or documenting the single-staging-branch assumption.
backend/infrahub/git/utils.py (2)
41-57: Repository object stored from first-encountered branch.The
repositoryobject stored inRepositoryDatacomes from whichever branch is iterated first (line 55 only executes whenrepo_name not in repositories). Subsequent branch iterations updatebranchesandbranch_infobut not therepositoryreference.This means
repository.commit.value,repository.internal_status.value, etc. reflect one specific branch's state, not necessarily the default branch or any specific branch. If downstream code expects the repository object to represent the default branch's state, this could cause subtle issues.Consider explicitly tracking which branch the stored repository object came from, or ensuring the default branch is processed first.
27-30: Newkindparameter adds useful flexibility.The parameterized
kindallows callers to filter by repository schema type. The defaultInfrahubKind.GENERICREPOSITORYmaintains backward compatibility while enabling the newInfrahubKind.REPOSITORYusage intasks.py.backend/infrahub/git/tasks.py (3)
198-204: Good refactor: DB-backed repository fetching reduces API overhead.Moving from
client.get_list_repositoriesto the database-backedget_repositories_commit_per_brancheliminates unnecessary network round-trips for repository discovery. The session scope is properly managed withasync with db.start_session().
221-228: LGTM: Repository field access is consistent.The access patterns (
repository.id,repository.name.value,repository.location.value,repository.default_branch.value) are consistent withCoreRepositoryprotocol and properly extract values from the repository node.
263-271: Theget_kind()method is correctly available on the repository object.Line 268 calls
repository.get_kind()where the repository is typed asCoreRepository(line 211).CoreRepositoryinherits fromCoreNodethrough multiple parent classes (LineageOwner,LineageSource,CoreGenericRepository,CoreTaskTarget), andCoreNodecorresponds toInfrahubNode, which implements theget_kind()method (backend/infrahub/core/node/init.py:112-114). The method is safe to call and will returnself._schema.kind.
Signed-off-by: Fatih Acar <fatih@opsmill.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
changelog/+d9659fb5.fixed.md (1)
1-1: Consider more concise phrasing for the changelog entry.The phrase "having a lot of" can be tightened for clarity. Consider rewording to something like "with many branches" or "at scale."
Apply this diff to improve conciseness:
-Improve branch creation and repository sync performance when having a lot of branches. +Improve branch creation and repository sync performance when scaling with many branches.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
changelog/+d9659fb5.fixed.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
**/*.md: Use-for unordered lists in markdown files
Add blank line before/after headings, code blocks, and lists in markdown files
Use fenced code blocks with language identifier in markdown files
No trailing spaces or multiple consecutive blank lines in markdown files
No bare URLs in markdown files - use[text](url)format
Files:
changelog/+d9659fb5.fixed.md
🪛 LanguageTool
changelog/+d9659fb5.fixed.md
[style] ~1-~1: Consider using a synonym to be more concise.
Context: ...repository sync performance when having a lot of branches.
(A_LOT_OF)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: E2E-testing-playwright
- GitHub Check: backend-benchmark
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: documentation
- GitHub Check: backend-docker-integration
- GitHub Check: backend-tests-integration
- GitHub Check: backend-tests-functional
- GitHub Check: backend-tests-unit
- GitHub Check: Cloudflare Pages
Summary by CodeRabbit
Refactor
Tests
Changelog
✏️ Tip: You can customize this high-level summary in your review settings.