Skip to content

Conversation

@waynesun09
Copy link
Contributor

@waynesun09 waynesun09 commented Nov 20, 2025

Summary

Add comprehensive temporal filtering capabilities to MCP tools with new search_commits tool, enhanced list_repos activity filtering, and improved search_code temporal support.

Changes

New Features

  • search_commits tool: Search Git commit history by actual commit time with time range, author, and query filtering
  • Comprehensive date parsing: Support ISO 8601 and relative formats ("30 days ago", "last week", "yesterday")
  • list_repos enhancements: Add activeAfter/activeBefore parameters for commit activity filtering
  • search_code enhancements: Add gitRevision, since/until, and includeDeletedFiles parameters
  • Date range validation: Prevent invalid date ranges (since > until)
  • Enhanced error handling: 30-second timeout for git operations, detailed error messages for common failures
  • Clear documentation: Distinction between index time and commit time filtering

Bug Fixes

  • Fix list_repos pagination bug where take limit was applied before activity filtering, returning fewer results than requested

Testing & Documentation

  • Add 106 comprehensive unit tests (59 dateUtils, 24 gitApi, 23 searchApi)
  • Update MCP tool descriptions with temporal parameter examples
  • Add "Date Format Examples" section to README
  • Clarify that repositories are cloned on Sourcebot server disk
  • Update CHANGELOG with all improvements

Test Results

All 106 new tests pass:

  • ✅ dateUtils.test.ts: 59 tests
  • ✅ gitApi.test.ts: 24 tests
  • ✅ searchApi.test.ts: 23 tests

Breaking Changes

None. All changes are backward compatible with optional parameters.

Resolves

Closes #511

Checklist

  • CHANGELOG.md updated
  • Tests added and passing
  • Documentation updated
  • Commit message follows conventional commit format
  • Branch rebased on latest main

Summary by CodeRabbit

  • New Features

    • Added search_commits tool for searching repository commits with optional date range, author, and result count filters.
    • Added natural language date support (e.g., "yesterday", "30 days ago", "last week") across all temporal parameters.
    • Added temporal filtering to code search via since, until, and gitRevision parameters.
    • Added activity-based repository filtering via activeAfter and activeBefore parameters.
  • Bug Fixes

    • Fixed list_repos pagination to correctly apply activity filters before limiting results.
  • Improvements

    • Enhanced list_repos with improved pagination feedback and repository metadata.
    • Added 30-second timeout for git operations handling large repositories.

✏️ Tip: You can customize this high-level summary in your review settings.

Add comprehensive temporal filtering capabilities with new search_commits
tool, enhanced list_repos activity filtering, and improved search_code
temporal support.

**New Features:**
- Add search_commits tool for Git commit history search with time range,
  author, and query filtering
- Add dateUtils.ts with comprehensive date parsing supporting ISO 8601
  and relative formats ("30 days ago", "last week", "yesterday")
- Add date range validation to prevent invalid ranges (since > until)
- Add activeAfter/activeBefore parameters to list_repos for commit
  activity filtering
- Add gitRevision, since/until, and includeDeletedFiles parameters to
  search_code
- Add 30-second timeout for git operations to handle large repositories
- Add detailed error messages for common git failures
- Move cache directory constants to @sourcebot/shared package

**Fixes:**
- Fix getRepos() pagination bug where take limit was applied before
  activity filtering, returning fewer results than requested

**Testing & Documentation:**
- Add comprehensive test coverage: 106 tests (59 dateUtils, 24 gitApi,
  23 searchApi)
- Clarify temporal filtering semantics: search_code filters by INDEX
  time (when Sourcebot indexed), while list_repos and search_commits
  filter by COMMIT time
- Clarify that repositories are cloned on Sourcebot server disk, not
  user's local disk, and cloning process may not be finished when
  search_commits is called
- Update MCP tool descriptions with temporal parameter examples and
  date format documentation
- Add "Date Format Examples" section to README
- Update CHANGELOG with all improvements

All changes are backward compatible with optional parameters.

Resolves sourcebot-dev#511

Signed-off-by: Wayne Sun <gsun@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR implements comprehensive temporal filtering capabilities across MCP tools and backend APIs. It adds a new search_commits tool, extends search_code and list_repos with time-based parameters, introduces date parsing utilities, adds git-based commit search functionality, and relocates cache constants to the shared package.

Changes

Cohort / File(s) Summary
Constants relocation
packages/backend/src/constants.ts, packages/shared/src/constants.ts
Moved REPOS_CACHE_DIR and INDEX_CACHE_DIR from local backend constants to shared package exports; backend now re-exports from @sourcebot/shared.
MCP types and schemas
packages/mcp/src/types.ts, packages/mcp/src/schemas.ts
Added SearchCommitsRequest, SearchCommitsResponse, and Commit types; introduced corresponding schema definitions with commit metadata fields (hash, date, message, refs, body, author_name, author_email).
MCP client API
packages/mcp/src/client.ts
Updated listRepos signature to accept optional activeAfter/activeBefore parameters; added new searchCommits method that POSTs to /api/commits with request validation.
MCP tools and entry points
packages/mcp/src/index.ts
Introduced new search_commits tool; extended search_code with gitRevision, since, until, includeDeletedFiles; enhanced list_repos with activeAfter/activeBefore and improved pagination with total count and enhanced messaging.
MCP documentation
packages/mcp/CHANGELOG.md, packages/mcp/README.md
Documented comprehensive temporal filtering features including relative date support, new tool, enhanced parameters, date format examples, and distinction between index-time vs commit-time filtering.
Backend API routes
packages/web/src/app/api/(server)/repos/route.ts, packages/web/src/app/api/(server)/commits/route.ts
Enhanced GET /repos to accept and pass activeAfter/activeBefore query parameters; added new POST /commits route for commit search with schema validation.
Web server actions
packages/web/src/actions.ts
Updated getRepos to accept optional activeAfter/activeBefore filters; added activity-based filtering logic that queries repositories by commit time and applies pagination after filtering.
Search date utilities
packages/web/src/features/search/dateUtils.ts, packages/web/src/features/search/dateUtils.test.ts
New module for parsing and validating temporal strings; supports ISO 8601 and relative formats (e.g., "30 days ago", "yesterday", "last week"); comprehensive test coverage with 90+ test cases.
Search git integration
packages/web/src/features/search/gitApi.ts, packages/web/src/features/search/gitApi.test.ts
New module for git-based commit searching with repository validation, date range validation, git log options construction, and error normalization (timeout, invalid repository, ambiguous arguments); comprehensive test suite with mocked dependencies.
Search API enhancements
packages/web/src/features/search/searchApi.ts, packages/web/src/features/search/searchApi.test.ts, packages/web/src/features/search/schemas.ts
Extended search with gitRevision, since, until, includeDeletedFiles parameters; implemented repository filtering by indexedAt timestamps; added date utility imports and temporal filtering logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MCPServer
    participant WebAPI
    participant GitAPI
    participant DateUtils
    participant Git

    Client->>MCPServer: search_commits(repoId, query, since, until, author)
    MCPServer->>WebAPI: POST /api/commits {repoId, query, since, until, author}
    WebAPI->>DateUtils: validateDateRange(since, until)
    DateUtils-->>WebAPI: validation result
    alt validation fails
        WebAPI-->>MCPServer: error response
        MCPServer-->>Client: error
    else validation succeeds
        WebAPI->>GitAPI: searchCommits({repoId, query, since, until, author})
        GitAPI->>DateUtils: toGitDate(since), toGitDate(until)
        DateUtils-->>GitAPI: git-compatible date strings
        GitAPI->>Git: git log --since --until --author --grep
        Git-->>GitAPI: commit objects
        GitAPI-->>WebAPI: Commit[] | ServiceError
        WebAPI-->>MCPServer: JSON response
        MCPServer-->>Client: formatted results
    end
Loading
sequenceDiagram
    participant Client
    participant MCPServer
    participant WebAPI
    participant Database
    participant Git

    Client->>MCPServer: list_repos(activeAfter, activeBefore)
    MCPServer->>WebAPI: GET /repos?activeAfter=X&activeBefore=Y
    WebAPI->>Database: fetch all repos (no take limit)
    Database-->>WebAPI: repos[]
    loop for each repo
        WebAPI->>Git: searchCommits to check activity in range
        alt repo has activity
            Git-->>WebAPI: commits found
            WebAPI->>WebAPI: include in filtered list
        else no activity or not on disk
            WebAPI->>WebAPI: exclude or handle error
        end
    end
    WebAPI->>WebAPI: apply pagination after filtering
    WebAPI-->>MCPServer: paginated results with totalCount
    MCPServer-->>Client: formatted repo list with pagination info
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Key areas requiring attention:
    • packages/web/src/features/search/gitApi.ts: Git integration logic with error handling and timeout configuration; verify correct error categorization and git client setup
    • packages/web/src/features/search/dateUtils.ts: Date parsing and relative date logic; ensure comprehensive handling of edge cases and timezone considerations
    • packages/web/src/actions.ts: Repository filtering with activity-based logic; verify pagination correctness after activity filtering and error handling for repositories not on disk
    • packages/web/src/features/search/searchApi.ts: Temporal parameter integration and repository filtering by indexedAt; ensure filtering is applied before search transformation
    • packages/mcp/src/index.ts: New search_commits tool implementation and list_repos pagination changes; verify output formatting and total count calculation
    • Test coverage: Verify test suites for gitApi.test.ts and dateUtils.test.ts adequately cover edge cases (leap years, timezone boundaries, invalid dates)

Possibly related PRs

Suggested labels

enhancement, feature, temporal-querying, mcp-tools, search

Suggested reviewers

  • msukkari

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add temporal query parameters to MCP tools' accurately summarizes the main change—adding temporal/timeframe parameters to MCP tools for time-based querying.
Linked Issues check ✅ Passed The PR comprehensively implements all coding requirements from issue #511: search_commits tool, temporal parameters (since/until/activeAfter/activeBefore/gitRevision), date parsing (ISO 8601 and relative formats), date range validation, and error handling with git timeouts.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing temporal query parameters and supporting infrastructure. The relocation of cache constants to shared package and pagination bug fix in list_repos are both necessary support changes directly tied to the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam
Copy link
Contributor

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

query = `${query} branch:${gitRevision}`;
}

// If since or until are provided, filter repositories by indexedAt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the value in being able to search the commit time, but what is the benefit of being able to filter search results by indexedAt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial plan is to add temporal option in search_code, but turns out it's better implement to provide new search_commit tool specifically to get the latest code change with commit time.

For the indexedAt option of search_code, the usage could be targeting two scenarios:

  • New Repos: If you add a new repository to Sourcebot today, it will have an indexedAt timestamp of today. Searching with since="today" will immediately include code from this new repository, allowing you to answer "What code was just added to our knowledge base?"
  • Synced Repos: When latest changes for an existing repo and re-indexes it, the indexedAt timestamp is updated.
    Scenario: A developer pushes a critical security fix and re-indexes happened on the repo.
    Result: A search with since="1 hour ago" will surface matches from that repo because it was just re-processed, effectively showing you "freshly indexed code."

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (10)
packages/web/src/features/search/searchApi.test.ts (1)

4-31: Unused mocks add unnecessary coupling to external modules

This file only uses dateUtils, but still mocks @/lib/serviceError, @/actions, and @/lib/auth. That adds extra surface area that can break these tests if those modules change, without providing value here.

You can either:

  • Remove these mocks from this file, or
  • Add actual searchApi integration tests that rely on them, so the mocks are justified.
packages/web/src/features/search/dateUtils.test.ts (1)

288-292: Clarify toGitDate behavior for unrecognized formats

The test name and comment say “convert unrecognized format to ISO”, but the assertion expects the original string to be preserved:

// For formats git doesn't natively understand, we convert to ISO
const result = toGitDate('some random string');
expect(result).toBe('some random string');

This is confusing for future readers. Either:

  • Update the expectation to actually check for an ISO conversion, if that’s the intended behavior, or
  • Rename the test and fix the comment to state that unknown formats are passed through unchanged to Git.
packages/web/src/app/api/(server)/repos/route.ts (1)

5-20: Temporal filters are correctly forwarded; consider normalizing empty query params

The new GET handler cleanly threads activeAfter / activeBefore through to getRepos and preserves the existing error handling pattern.

One minor robustness tweak: searchParams.get(...) ?? undefined will pass through an empty string when the URL contains ?activeAfter= or ?activeBefore=. If you prefer to treat “present but empty” the same as “unset”, you could normalize with:

const activeAfter = searchParams.get("activeAfter") || undefined;
const activeBefore = searchParams.get("activeBefore") || undefined;

This avoids feeding accidental empty strings into downstream date parsing.

packages/web/src/app/api/(server)/commits/route.ts (1)

1-34: Solid validation flow; tighten type usage / schema reuse

The POST handler’s flow (JSON → zod safeParseAsyncschemaValidationError / serviceErrorResponsesearchCommits) looks good and matches the rest of the API.

Two minor nits:

  • SearchCommitsRequest is imported but never used. Either remove the import or use it to type the validated payload (e.g., const response = await searchCommits(parsed.data as SearchCommitsRequest);) so this route stays coupled to the shared request type.
  • If you already have a shared Zod schema for searchCommits (e.g., alongside other search schemas), consider reusing it here instead of defining a new one inline to avoid future drift between layers.
packages/mcp/src/schemas.ts (1)

195-214: Consider constraining maxCount to avoid pathological requests

The new searchCommitsRequestSchema / commitSchema pair matches the documented shape and looks consistent.

maxCount is currently z.number().optional(), which allows negative values or extremely large limits. Given this is likely passed through to git log (and could be user‑controlled via MCP), you may want to harden it, e.g.:

maxCount: z
  .number()
  .int()
  .positive()
  .max(500)
  .optional(),

(or whatever upper bound matches your performance expectations).

packages/web/src/features/search/dateUtils.ts (1)

6-26: Clarify parseTemporalDate contract and clean up unused state

The JSDoc and implementation for parseTemporalDate don’t currently line up:

  • Doc says it “Returns an ISO 8601 date string or null if invalid”, but the function:
    • Returns undefined when !dateStr.
    • Returns the original dateStr when parsing fails (so not necessarily ISO).
  • Callers like toDbDate and validateDateRange implicitly treat “unparseable” as undefined/Invalid Date, while toGitDate is fine with git-native strings.

This makes the effective contract subtle and easy to misread from the JSDoc.

Consider tightening this up by either:

  • Updating the JSDoc and return type to something like “ISO 8601 when parsed, otherwise the original string, or undefined when input is falsy”, or
  • Returning undefined for unparseable inputs and letting toGitDate own the “fallback to original for git” behavior.

Also, relativePatterns is currently unused and can be removed or wired into the parsing logic to avoid dead code.

Also applies to: 35-41, 99-102, 144-156

packages/web/src/features/search/gitApi.test.ts (1)

11-19: Align error-handling tests with production sew behavior

In tests you mock sew as a simple passthrough:

vi.mock('@/actions', () => ({
    sew: (fn: () => any) => fn(),
}));

and later assert that searchCommits rejects for “other Error instances” and non-Error exceptions. In production, searchCommits is wrapped by the real sew from actions.ts, which catches thrown errors and returns a ServiceError instead of letting the promise reject.

That means the “throws” behavior in these tests doesn’t match what callers will see at runtime. To avoid surprises, consider either:

  • Asserting on a ServiceError shape for these cases (matching real sew), or
  • Removing the sew dependency from searchCommits and letting routes/actions wrap it, if you explicitly want searchCommits itself to throw.

Either way, having tests reflect the actual public contract of searchCommits will make future changes safer.

Also applies to: 287-347

packages/web/src/actions.ts (1)

463-541: Handle ServiceError results from searchCommits explicitly in activity filtering

In the activity-based branch of getRepos:

const activityChecks = await Promise.all(repos.map(async (repo) => {
    try {
        const commits = await searchCommits({ ... });

        if (Array.isArray(commits) && commits.length > 0) {
            return repo;
        }
    } catch (e) {
        // ...
    }
    return null;
}));

searchCommits is already wrapped in sew (in gitApi.ts), so for git/log failures it resolves to a ServiceError rather than throwing. As a result:

  • The catch block here will rarely, if ever, run for git-related issues.
  • Cases like “repo not on disk”, “invalid date range”, or git timeouts will show up as a ServiceError value, get treated as “no activity”, and be silently filtered out with no logging.

If you want list_repos with activeAfter/activeBefore to surface date-range validation failures instead of just returning an empty list, and to log unexpected git errors, consider branching explicitly on the union:

const result = await searchCommits({
    repoId: repo.id,
    since: activeAfter,
    until: activeBefore,
    maxCount: 1,
});

if (Array.isArray(result)) {
    return result.length > 0 ? repo : null;
}

// `result` is a ServiceError here
const message = result.message ?? '';
if (!message.includes('does not exist')) {
    console.error(
        `Error checking activity for repo ${repo.id} (${repo.name}):`,
        result,
    );
}
return null;

This keeps “repo not yet cloned” behaving as “no activity” while making invalid ranges and other unexpected failures visible.

Separately, this wiring introduces a circular dependency: actions.ts imports searchCommits from ./features/search/gitApi, and gitApi.ts imports sew from @/actions. It works today because sew is only used at call time, but it’d be cleaner long term to extract sew to a small shared module (or inject it) to avoid subtle initialization issues in future refactors.

packages/web/src/features/search/gitApi.ts (1)

1-22: searchCommits implementation looks solid; consider small cleanups

Overall, this module cleanly encapsulates git-based commit search:

  • Validates the repo path and date range up front.
  • Uses toGitDate to preserve git-native formats when possible.
  • Configures simpleGit with a sensible timeout and concurrency.
  • Provides clear, user-facing messages for common git failures (invalid repo, ambiguous args, timeout), with a generic fallback.

A couple of minor nits you might consider:

  • You check existsSync(repoPath) both before calling createGitClientForPath and inside that helper. One of those checks could be dropped unless you specifically want the second check to catch races and produce a different message.
  • In the catch block you return unexpectedError(...) for known cases but throw for unknown ones, relying on sew to convert those into generic ServiceErrors. That’s fine, but it means callers always see ServiceError (never a thrown error) in production, so other code (like getRepos) should treat the result as Commit[] | ServiceError rather than depending on exceptions.

These are non-blocking; the core behavior and error mapping look good.

Also applies to: 58-150

packages/mcp/src/index.ts (1)

233-347: list_repos client‑side filtering/pagination looks good; confirm listRepos returns the full filtered set.

The flow of: call listRepos({ activeAfter, activeBefore }), then apply name filtering, sort, compute totalCount, and paginate on the MCP side cleanly fixes the “take before filter” bug, and the new empty/out‑of‑range page messages (including the temporal‑filtering note) are clear.

This does rely on listRepos returning the complete set of repositories for the given temporal filters (i.e., not already paginated). If the underlying API still pages results, totalCount, totalPages, and the “Try pageNumber between 1 and X” guidance will all be based only on that first backend page.

If listRepos is still paginated underneath, consider either:

  • Fetching all pages when temporal filters are used, or
  • Threading backend pagination parameters and totals through to this tool instead of recomputing them locally.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09507d3 and 8c82979.

📒 Files selected for processing (18)
  • packages/backend/src/constants.ts (2 hunks)
  • packages/mcp/CHANGELOG.md (1 hunks)
  • packages/mcp/README.md (4 hunks)
  • packages/mcp/src/client.ts (3 hunks)
  • packages/mcp/src/index.ts (6 hunks)
  • packages/mcp/src/schemas.ts (1 hunks)
  • packages/mcp/src/types.ts (2 hunks)
  • packages/shared/src/constants.ts (1 hunks)
  • packages/web/src/actions.ts (2 hunks)
  • packages/web/src/app/api/(server)/commits/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/repos/route.ts (1 hunks)
  • packages/web/src/features/search/dateUtils.test.ts (1 hunks)
  • packages/web/src/features/search/dateUtils.ts (1 hunks)
  • packages/web/src/features/search/gitApi.test.ts (1 hunks)
  • packages/web/src/features/search/gitApi.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (1 hunks)
  • packages/web/src/features/search/searchApi.test.ts (1 hunks)
  • packages/web/src/features/search/searchApi.ts (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Test Web
packages/shared/src/constants.ts

[error] 37-37: TypeError: The "path" argument must be of type string. Received undefined

🔇 Additional comments (9)
packages/web/src/features/search/searchApi.test.ts (1)

33-323: Temporal coverage in this suite looks solid

The tests exercise ISO dates, relative phrases, missing parameters, repository filtering, and inverted ranges with fixed system time, which should give good confidence that temporal parameters are wired correctly into query construction and range validation (via dateUtils).

packages/web/src/features/search/dateUtils.test.ts (1)

9-378: Thorough temporal parsing and range validation tests

The suite covers a wide range of ISO/relative formats, boundary conditions (month/year/leap year, midnight/end-of-day), and integration scenarios across parseTemporalDate, validateDateRange, toDbDate, and toGitDate. That level of coverage should make regressions in date handling unlikely.

packages/backend/src/constants.ts (1)

11-11: Re-exporting cache-dir constants from shared module looks appropriate

Re-exporting REPOS_CACHE_DIR and INDEX_CACHE_DIR from @sourcebot/shared keeps the backend public surface unchanged while centralizing the actual definitions in one place, which should reduce divergence between services.

packages/mcp/src/types.ts (1)

13-15: Commit-search types are correctly wired into the public API

The new SearchCommitsRequest, SearchCommitsResponse, and Commit exports cleanly mirror the schemas and extend the MCP surface without breaking existing callers. Just keep this file in sync with the corresponding web types per the header note.

Also applies to: 36-38

packages/mcp/README.md (1)

169-205: Temporal docs align well with the new APIs

The additions for search_code (gitRevision, since, until, includeDeletedFiles), list_repos (activeAfter / activeBefore), and the new search_commits section clearly describe index‑time vs commit‑time semantics and supported date formats. These explanations match the corresponding schemas and server logic and should make the new temporal behavior much easier to understand for MCP clients.

Also applies to: 221-253, 243-253

packages/web/src/actions.ts (1)

315-321: API key creation guard for non-owner users looks correct

The new check against EXPERIMENT_DISABLE_API_KEY_CREATION_FOR_NON_ADMIN_USERS with an early FORBIDDEN ServiceError and clear log message is consistent with the rest of the file’s patterns and should be safe to ship.

packages/mcp/CHANGELOG.md (1)

10-29: Changelog entries clearly capture the new temporal MCP capabilities

The “Unreleased” section accurately reflects the added tools/parameters, temporal format support, error handling, and the list_repos pagination fix, and clearly distinguishes index time vs commit time. This is consistent with the PR’s behavior and keeps consumers informed.

packages/mcp/src/client.ts (1)

2-4: MCP client wiring for temporal repos and searchCommits is consistent and safe

The updated listRepos signature (optional params with activeAfter/activeBefore) and the new searchCommits client:

  • Follow the same fetch + isServiceError + schema-parse pattern as the existing methods.
  • Remain backwards compatible for existing listRepos() callers.
  • Correctly propagate org domain and optional API key headers.

No issues from the client side.

Also applies to: 24-47, 67-83

packages/mcp/src/index.ts (1)

52-67: Temporal params for search_code are plumbed correctly; just confirm index‑time semantics.

The new gitRevision, since, until, and includeDeletedFiles fields are optional, preserve existing query composition, and are passed through to search() without breaking current callers. Please double‑check that search({ since, until }) actually applies the index‑time filters described here; if it instead uses commit time, consider aligning the implementation or updating the descriptions so it doesn’t conflict conceptually with search_commits and list_repos.

Also applies to: 80-83, 103-106

import escapeStringRegexp from 'escape-string-regexp';
import { z } from 'zod';
import { listRepos, search, getFileSource } from './client.js';
import { listRepos, search, getFileSource, searchCommits } from './client.js';
Copy link

@coderabbitai coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

search_commits.repoId is typed as a number, which likely conflicts with string‑typed repo IDs elsewhere.

Other tools appear to treat repository identifiers as strings (e.g., search_code.filterByRepoIds: string[], get_file_source.repoId: string, and list_repos printing id: ${repo.repoName}), but search_commits currently defines repoId as z.number(). That mismatch may make it awkward or error‑prone for clients to take the ID from list_repos and pass it directly into search_commits, and may not match what searchCommits expects.

I’d recommend:

-        repoId: z.number().describe(`The ID of the repository to search in. Obtain this by calling 'list_repos' first.`),
+        repoId: z.string().describe(`The Sourcebot repository ID to search in. Obtain this by calling 'list_repos' first.`),

and updating the searchCommits argument type to accept the same string ID shape used by the other tools.

As a small polish, if the real default for maxCount is 50, you could also encode that in the schema (e.g., .int().positive().default(50)) so validation and docs stay in sync.

Also applies to: 191-231

🤖 Prompt for AI Agents
In packages/mcp/src/index.ts around lines 8 and also 191-231, change the
search_commits schema and related types so repoId is a string (not z.number())
to match other APIs (e.g., search_code.filterByRepoIds and get_file_source),
update the searchCommits function/signature to accept the string ID shape used
elsewhere, and ensure callers pass the repo id from list_repos directly;
additionally, set the maxCount schema to .int().positive().default(50) (or the
real default) so validation/docs match the actual default behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in b30bbd3, dual support with repo id and name

@brendan-kellam the current list_repos return id which is repo name, don't want to change that field to be numeric id and add a new repo name field which might cause inconsistent issue.

What do you think?

} = {}) => sew(() =>
withOptionalAuthV2(async ({ org, prisma }) => {
// When filtering by activity, we need to fetch all repos first,
// then apply pagination after filtering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we aren't really doing pagination here with take

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for clarify that.
My implementation with remove the limit to load as much as possible might cause server side performance issue, so I'll update it also with combine the take at end, but batching load first and set hard maximum limit on total.

- Remove unused mocks from searchApi.test.ts
- Fix test description in dateUtils.test.ts to match behavior
- Align error handling tests in gitApi.test.ts with production sew behavior
- Add comprehensive JSDoc to parseTemporalDate function
- Remove unused relativePatterns variable in dateUtils.ts
- Add validation to maxCount parameter (max 500) in schemas
- Normalize empty query params in repos route
- Extract searchCommitsRequestSchema to shared schemas file
- Use SearchCommitsRequest type for type safety in commits route
- Remove duplicate existsSync check in gitApi.ts
- Fix ServiceError handling in activity filtering logic in actions.ts
- Add explicit error logging for unexpected errors
- Remove unused includeDeletedFiles parameter (Brendan's feedback)

Signed-off-by: Wayne Sun <gsun@redhat.com>
- Implement concurrency-limited activity filtering in getRepos with safety cap
- Move activity filtering constants to @sourcebot/shared
- Fix pagination logic to ensure accurate results while protecting server
Add support for both numeric database IDs and string repository names
in the search_commits tool, allowing users to directly use repository
identifiers from list_repos output.

**Implementation:**
- Add resolveRepoId utility function to convert string repo names to
  numeric IDs via database lookup
- Update SearchCommitsRequest interface to accept repoId: string | number
- Wrap searchCommits in withOptionalAuthV2 for database access
- Update MCP schema and API route schema with z.union([z.number(), z.string()])

**Testing:**
- Add comprehensive unit tests for repository identifier resolution
- Test both numeric ID and string name inputs
- Test error handling for non-existent repositories
- Verify database query parameters
- All 29 tests passing

**Documentation:**
- Update README to document both identifier types
- Update CHANGELOG with new functionality
- Clear error messages guide users to use list_repos for valid identifiers

This enhancement improves the user experience by eliminating the need
to manually convert repository names to numeric IDs when using
search_commits after list_repos.

Signed-off-by: Wayne Sun <gsun@redhat.com>
Add guards to REPOS_CACHE_DIR and INDEX_CACHE_DIR constants to prevent
module load errors when env.DATA_CACHE_DIR is undefined (e.g., when
SKIP_ENV_VALIDATION=1 during builds).

Use safe fallback paths (/tmp/sourcebot/*) that prevent path.join from
being called with undefined, fixing the Test Web pipeline error:
'TypeError: The "path" argument must be of type string. Received undefined'

Addresses PR review comment about guarding against undefined DATA_CACHE_DIR.

Signed-off-by: Wayne Sun <gsun@redhat.com>
Add missing optional fields to searchRequestSchema in MCP package to match
the web schema at packages/web/src/features/search/schemas.ts:
- gitRevision: git revision to search in
- since: filter by index date (after this date)
- until: filter by index date (before this date)

This resolves the schema synchronization violation noted in the code header
comment and addresses the PR review feedback about missing fields.

Keeps both schemas in sync as required by the @note comment at the top of
the file.

Signed-off-by: Wayne Sun <gsun@redhat.com>
**Server Code Leak Fix:**
- Move REPOS_CACHE_DIR and INDEX_CACHE_DIR to new constants.server.ts
- Remove env.server.js import from constants.ts to prevent server dependencies
  from leaking into client bundles through index.client.js
- Export server-only constants from index.server.ts but not index.client.ts
- This prevents Webpack from trying to bundle Node.js modules (node:https,
  node:http, node:fs, node:net) into client code

**Dynamic Import Fix:**
- Change './dateUtils.js' to './dateUtils' in searchApi.ts dynamic import
- Matches Next.js module resolution conventions used elsewhere in the codebase
- Fixes "Module not found: Can't resolve './dateUtils.js'" build error

**Root Cause:**
The guard fix for env.DATA_CACHE_DIR added server-side imports to constants.ts,
which was exported through index.client.ts, causing Google Cloud Secret Manager
dependencies to leak into client bundles and fail the Docker build.

This fix properly separates client-safe constants from server-only constants
while maintaining the guard against undefined env.DATA_CACHE_DIR.

Signed-off-by: Wayne Sun <gsun@redhat.com>
**ESLint/TypeScript Fixes:**
- Remove unused REPOS_CACHE_DIR import from actions.ts
- Replace `any` types in gitApi.test.ts with proper generic types:
  - Use generic function signatures for mock implementations
  - Use `unknown` for type guards (matches actual isServiceError signature)
  - Use `as unknown as vi.Mock` for mocked functions
  - Use explicit object types instead of `any` for type assertions
- Replace `any` types in gitApi.ts with proper types:
  - Import PrismaClient type from @sourcebot/db
  - Type prisma parameter as PrismaClient in resolveRepoId
  - Type logOptions as Record<string, string | number | null>

**Why These Changes:**
- Follows TypeScript best practices for type safety
- Matches the actual type signatures used in the codebase
- Satisfies @typescript-eslint/no-explicit-any linting rule
- Maintains type safety while allowing proper mocking in tests

Fixes the CI build failures from Next.js linting step.

Signed-off-by: Wayne Sun <gsun@redhat.com>
Add Promise<Response> return type annotation to the POST function
in the commits API route to satisfy Next.js type checking requirements.

Next.js route handlers must return Promise<void | Response>, and the
explicit return type annotation ensures TypeScript understands that
all code paths (including those through withOptionalAuthV2) return
Response objects via serviceErrorResponse() or Response.json().

This fixes the type error:
'Promise<ServiceError | Response>' is not assignable to
'Promise<void | Response>'

Signed-off-by: Wayne Sun <gsun@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Add timeframe/temporal query parameters to MCP tools for trend analysis

2 participants