Skip to content

Conversation

@fatih-acar
Copy link
Contributor

@fatih-acar fatih-acar commented Nov 28, 2025

Summary by CodeRabbit

  • Refactor

    • Prevented duplicate branch-hash processing to speed up branch handling and repository sync.
    • Improved repository data handling (supports additional repository types) and added staging-branch detection.
    • Repository sync flow now uses database-backed queries for more efficient synchronization.
  • Tests

    • Updated assertions to accommodate the new repository field handling.
  • Changelog

    • Noted performance improvements for branch creation and repository synchronization.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Nov 28, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 28, 2025

Walkthrough

This pull request updates schema management, git models, tasks, utils, and tests. purge_inactive_branches now tracks processed branch hashes to avoid re-processing. RepositoryData gains model_config, a repository field (CoreRepository|CoreReadOnlyRepository|Node), and get_staging_branch(). sync_remote_repositories was refactored to use a database session with get_repositories_commit_per_branch instead of a client list. get_repositories_commit_per_branch adds a kind parameter, stores full repository objects on RepositoryData, and returns a mapping by repository. Tests now assert the repository identity separately and compare dumped data excluding repository. A changelog entry was added.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(backend): few improvements for scaling branches' accurately describes the main changes in the PR, which focus on performance optimizations for handling large numbers of branches through improved tracking mechanisms in the schema manager and repository utilities.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 28, 2025

CodSpeed Performance Report

Merging #7752 will not alter performance

Comparing fac-branch-scale-IFC-2059 (a9eea39) with stable (c964c21)

Summary

✅ 12 untouched

@fatih-acar fatih-acar force-pushed the fac-branch-scale-IFC-2059 branch 2 times, most recently from 4769e44 to 3c19703 Compare November 28, 2025 14:35
@fatih-acar fatih-acar changed the title fix(backend): few improvements for scaling fix(backend): few improvements for scaling branches Nov 28, 2025
@fatih-acar fatih-acar force-pushed the fac-branch-scale-IFC-2059 branch from 3c19703 to b87cf09 Compare December 1, 2025 13:28
This would not allow branch creation to scale since we purge inactive
branches on each create, thus processing each branch.

10x create branch speedup at 100 branches (1.5s vs 15s).

Signed-off-by: Fatih Acar <fatih@opsmill.com>
The SDK get_list_repositories doesn't scale.
A workaround is to use the get_repositories_commit_per_branch helper
that is similar but using DB queries to get the data.

Signed-off-by: Fatih Acar <fatih@opsmill.com>
In the recurring Sync Git Repositories task, only sync branches that are
flagged with the sync_with_git flag.

Signed-off-by: Fatih Acar <fatih@opsmill.com>
This reverts commit 23ab221.

Not sure of the impact of this change (could break features related to
repositories). Also, this change is not required for real world usage
(syncing a lot of branches between two sync jobs).
@fatih-acar fatih-acar force-pushed the fac-branch-scale-IFC-2059 branch from b87cf09 to 0f6e2db Compare December 2, 2025 09:44
@fatih-acar fatih-acar marked this pull request as ready for review December 2, 2025 12:01
@fatih-acar fatih-acar requested a review from a team as a code owner December 2, 2025 12:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/infrahub/git/tasks.py (1)

207-207: Type annotation assumes CoreRepository but union type is broader.

The type annotation repository: CoreRepository = repository_data.repository assumes all results are CoreRepository, but repository_data.repository is typed as CoreRepository | CoreReadOnlyRepository | Node.

Since kind=InfrahubKind.REPOSITORY is passed to get_repositories_commit_per_branch, this assumption is likely safe in practice. However, for type safety, consider using cast() or adding a runtime assertion:

from typing import cast
repository = cast(CoreRepository, repository_data.repository)

Or validate explicitly:

repository = repository_data.repository
assert hasattr(repository, 'default_branch'), f"Expected CoreRepository, got {type(repository)}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43241fc and 0f6e2db.

📒 Files selected for processing (5)
  • backend/infrahub/core/schema/manager.py (1 hunks)
  • backend/infrahub/git/models.py (2 hunks)
  • backend/infrahub/git/tasks.py (5 hunks)
  • backend/infrahub/git/utils.py (3 hunks)
  • backend/tests/unit/git/test_utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Always use triple quotes (""") for Python docstrings
Follow Google-style docstring format for Python docstrings
Include brief one-line description in Python docstrings when applicable
Include detailed description in Python docstrings when applicable
Include Args/Parameters section without typing in Python docstrings when applicable
Include Returns section in Python docstrings when applicable
Include Raises section in Python docstrings when applicable
Include Examples section in Python docstrings when applicable

**/*.py: Use type hints for all function parameters and return values in Python
Use Async whenever possible in Python
Use async def for asynchronous functions in Python
Use await for asynchronous calls in Python
Use Pydantic models for dataclasses in Python
Use ruff and mypy for type checking and code validation in Python

Use ruff and mypy to validate and lint Python files

Files:

  • backend/infrahub/git/models.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/git/utils.py
  • backend/tests/unit/git/test_utils.py
  • backend/infrahub/core/schema/manager.py
backend/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use type hints for Python code in backend

Files:

  • backend/infrahub/git/models.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/git/utils.py
  • backend/tests/unit/git/test_utils.py
  • backend/infrahub/core/schema/manager.py
backend/infrahub/**/*.py

📄 CodeRabbit inference engine (backend/AGENTS.md)

backend/infrahub/**/*.py: Use async/await for all I/O operations to maintain async-first architecture
Type hint all function parameters and returns in Python code
Use Pydantic models for defining data structures instead of plain dictionaries
Use Query class pattern (extending infrahub.core.query.Query) for all database operations instead of unparameterized Cypher queries
Use Google-style docstrings with Args, Returns, and Raises sections for all functions
Use snake_case for function and variable names
Use PascalCase for class names
Use UPPER_SNAKE_CASE for constant definitions
Do not use unparameterized Cypher queries; always use parameterized queries to prevent injection
Do not block the event loop with synchronous I/O operations

Files:

  • backend/infrahub/git/models.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/core/schema/manager.py
backend/tests/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Name test files as test_<module>.py and mirror source structure in tests directory

Files:

  • backend/tests/unit/git/test_utils.py
🧠 Learnings (1)
📚 Learning: 2025-12-01T22:16:13.668Z
Learnt from: CR
Repo: opsmill/infrahub PR: 0
File: backend/AGENTS.md:0-0
Timestamp: 2025-12-01T22:16:13.668Z
Learning: Applies to backend/infrahub/**/*.py : Use Pydantic models for defining data structures instead of plain dictionaries

Applied to files:

  • backend/infrahub/git/models.py
🧬 Code graph analysis (3)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (1)
  • get_repositories_commit_per_branch (27-64)
backend/infrahub/workers/dependencies.py (2)
  • get_database (70-71)
  • get_client (50-51)
backend/infrahub/git/utils.py (2)
backend/infrahub/core/manager.py (1)
  • NodeManager (78-1416)
backend/infrahub/git/models.py (2)
  • RepositoryBranchInfo (201-202)
  • RepositoryData (205-224)
backend/infrahub/core/schema/manager.py (2)
backend/infrahub/core/schema/schema_branch.py (3)
  • get (324-357)
  • get_all (449-456)
  • duplicate (292-301)
backend/infrahub/core/models.py (2)
  • nodes (62-64)
  • duplicate (501-503)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (10)
backend/infrahub/core/schema/manager.py (1)

777-785: Optimization to avoid redundant schema hash collection looks correct.

The logic correctly skips branches that share an already-processed schema hash, reducing redundant work when multiple branches point to the same schema. The conditional on line 780 properly handles both cases: when a branch has no hash yet (new/untracked) and when the hash hasn't been seen before.

One edge case to verify: if active_branch is in _branch_hash_by_name but not in _branches, line 781-782 would add the hash to branch_processed, but line 783's get() would return None, skipping hash collection. This seems intentional (stale entry cleanup), but worth confirming this scenario is expected.

backend/tests/unit/git/test_utils.py (2)

34-47: Test assertions correctly adapted for the new repository field.

The two-step verification pattern (identity check on repository.id, then model_dump(exclude=["repository"]) for remaining fields) is the right approach for testing a Pydantic model containing a non-serializable object reference.


66-99: LGTM!

The multi-branch test correctly verifies repository identity and field values across different branches with modified commits.

backend/infrahub/git/models.py (2)

206-212: Model configuration and repository field look good.

The arbitrary_types_allowed=True is necessary since Node is not a Pydantic-native type. The union type properly accommodates both repository protocol types and the generic Node fallback.


220-224: Consider edge case: multiple branches with staging status.

The get_staging_branch() method returns the first branch with internal_status == "staging". If multiple staging branches are possible (even temporarily), this would return an arbitrary one based on dict iteration order.

If only one staging branch is ever valid, this is fine. Otherwise, consider returning a list or documenting the single-staging-branch assumption.

backend/infrahub/git/utils.py (2)

41-57: Repository object stored from first-encountered branch.

The repository object stored in RepositoryData comes from whichever branch is iterated first (line 55 only executes when repo_name not in repositories). Subsequent branch iterations update branches and branch_info but not the repository reference.

This means repository.commit.value, repository.internal_status.value, etc. reflect one specific branch's state, not necessarily the default branch or any specific branch. If downstream code expects the repository object to represent the default branch's state, this could cause subtle issues.

Consider explicitly tracking which branch the stored repository object came from, or ensuring the default branch is processed first.


27-30: New kind parameter adds useful flexibility.

The parameterized kind allows callers to filter by repository schema type. The default InfrahubKind.GENERICREPOSITORY maintains backward compatibility while enabling the new InfrahubKind.REPOSITORY usage in tasks.py.

backend/infrahub/git/tasks.py (3)

198-204: Good refactor: DB-backed repository fetching reduces API overhead.

Moving from client.get_list_repositories to the database-backed get_repositories_commit_per_branch eliminates unnecessary network round-trips for repository discovery. The session scope is properly managed with async with db.start_session().


221-228: LGTM: Repository field access is consistent.

The access patterns (repository.id, repository.name.value, repository.location.value, repository.default_branch.value) are consistent with CoreRepository protocol and properly extract values from the repository node.


263-271: The get_kind() method is correctly available on the repository object.

Line 268 calls repository.get_kind() where the repository is typed as CoreRepository (line 211). CoreRepository inherits from CoreNode through multiple parent classes (LineageOwner, LineageSource, CoreGenericRepository, CoreTaskTarget), and CoreNode corresponds to InfrahubNode, which implements the get_kind() method (backend/infrahub/core/node/init.py:112-114). The method is safe to call and will return self._schema.kind.

Signed-off-by: Fatih Acar <fatih@opsmill.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
changelog/+d9659fb5.fixed.md (1)

1-1: Consider more concise phrasing for the changelog entry.

The phrase "having a lot of" can be tightened for clarity. Consider rewording to something like "with many branches" or "at scale."

Apply this diff to improve conciseness:

-Improve branch creation and repository sync performance when having a lot of branches.
+Improve branch creation and repository sync performance when scaling with many branches.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f6e2db and a9eea39.

📒 Files selected for processing (1)
  • changelog/+d9659fb5.fixed.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Use - for unordered lists in markdown files
Add blank line before/after headings, code blocks, and lists in markdown files
Use fenced code blocks with language identifier in markdown files
No trailing spaces or multiple consecutive blank lines in markdown files
No bare URLs in markdown files - use [text](url) format

Files:

  • changelog/+d9659fb5.fixed.md
🪛 LanguageTool
changelog/+d9659fb5.fixed.md

[style] ~1-~1: Consider using a synonym to be more concise.
Context: ...repository sync performance when having a lot of branches.

(A_LOT_OF)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: documentation
  • GitHub Check: backend-docker-integration
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-unit
  • GitHub Check: Cloudflare Pages

@fatih-acar fatih-acar merged commit cd2ed7c into stable Dec 2, 2025
41 checks passed
@fatih-acar fatih-acar deleted the fac-branch-scale-IFC-2059 branch December 2, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants