-
Notifications
You must be signed in to change notification settings - Fork 169
experiment: Self-serve repository indexing for public GitHub repositories #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@brendan-kellam your pull request is missing a changelog! |
|
Caution Review failedThe pull request is closed. WalkthroughAdds an experimental self-serve flow to add public GitHub repositories by URL: a server action using Octokit/Prisma gated by new server env flags, a client AddRepositoryDialog and UI wiring (DataTable headerActions), and removal of the legacy AddRepoButton. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant RT as RepositoryTable
participant D as AddRepositoryDialog
participant A as actions.experimental_addGithubRepositoryByUrl
participant GH as GitHub (Octokit)
participant DB as Prisma/DB
U->>RT: Click "Add repository"
RT->>D: Open dialog
U->>D: Submit repository URL
D->>A: experimental_addGithubRepositoryByUrl(url, domain)
A->>GH: Fetch repo metadata -> external_id
GH-->>A: Repo metadata / errors
A->>DB: Find existing by external_id/org/host
DB-->>A: Exists? yes/no
alt Not exists
A->>DB: Create GitHub connection (config, token ref optional)
DB-->>A: connectionId
A-->>D: { connectionId }
D->>RT: Close dialog, refresh
else Exists or error
A-->>D: ServiceError
D->>U: Show error toast
end
sequenceDiagram
participant P as Repos Page
participant T as RepositoryTable
participant DT as DataTable
P->>T: isAddReposButtonVisible = (env flag)
alt Flag true
T->>DT: headerActions = Add button
else Flag false
T->>DT: headerActions = null
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Skipping since this is a experimental feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (8)
packages/web/src/components/ui/data-table.tsx (1)
74-79: Remove or update stale TODO referencing AddRepoButton.The comment block still mentions combining logic with AddRepoButton, which has been removed in this PR. Suggest cleaning it up to avoid confusion.
Apply this diff to remove the outdated comment:
- {/* - TODO(auth): Combine this logic with existing add repo button logic in AddRepoButton component - Show a button on the demo site that allows users to add new repositories - by updating the demo-site-config.json file and opening a PR. - */}packages/web/src/actions.ts (4)
810-817: Broaden accepted GitHub URL formats (www, SSH) and keep parity with clientCurrent patterns miss common forms like:
- https://www.github.com/owner/repo
- git@github.com:owner/repo(.git)
- ssh://git@github.com/owner/repo(.git)
They’re also slightly out of sync with the client validator. Consider expanding patterns and normalizing the host to reduce false negatives and avoid drift.
Example update:
const patterns = [ - // https://github.com/owner/repo or https://github.com/owner/repo.git - /^https?:\/\/github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, + // https://github.com/owner/repo or https?://www.github.com/owner/repo(.git) + /^https?:\/\/(?:www\.)?github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, // github.com/owner/repo - /^github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, + /^(?:www\.)?github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, // owner/repo - /^([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+)$/ + /^([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+)$/, + // SSH forms + /^git@github\.com:([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?$/, + /^ssh:\/\/git@github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/ ];Longer-term, factor this into a single shared parser/validator (server + client) to avoid divergence.
856-874: Use more precise error codes for 404/403/500 pathsThe errorCode is set to INVALID_REQUEST_BODY for 404/403/500 cases. Elsewhere in this file, you use NOT_FOUND and FORBIDDEN style codes. Aligning codes improves client handling and telemetry.
Example:
- errorCode: ErrorCode.INVALID_REQUEST_BODY, + errorCode: ErrorCode.NOT_FOUND, ... - errorCode: ErrorCode.INVALID_REQUEST_BODY, + errorCode: ErrorCode.FORBIDDEN, ... - errorCode: ErrorCode.INVALID_REQUEST_BODY, + errorCode: ErrorCode.INTERNAL_SERVER_ERROR,If preferred, use the existing helpers (e.g., return notFound("...")) for 404.
876-892: Duplicate detection could also consider repo name to catch cross-connection duplicatesYou’re checking by external_id which is great for GitHub-origin repos. However, the codebase notes rare cases where a repo might also exist via a generic git connection and share the same canonical name (e.g., “github.com/owner/repo”). Consider adding a secondary uniqueness check by name to reduce duplicates across connection types.
Example (after existingRepo check):
const canonicalName = `github.com/${owner}/${repo}`; const existingByName = await prisma.repo.findFirst({ where: { orgId: org.id, name: canonicalName }, }); if (existingByName) { return { statusCode: StatusCodes.BAD_REQUEST, errorCode: ErrorCode.CONNECTION_ALREADY_EXISTS, message: "A repository with this name already exists.", } satisfies ServiceError; }
794-921: Require at least MEMBER and disable anonymous on experimental actionThe experimental endpoint currently allows Guest-level and anonymous calls, unlike our standard
createConnection(which defaults to MEMBER and disallows anonymous, and also honorsCONFIG_PATH). To reduce risk, please updateexperimental_addGithubRepositoryByUrlas follows:• Change its
withOrgMembershipcall fromOrgRole.GUESTtoOrgRole.MEMBER
• Remove (or set tofalse) itsallowAnonymousAccess = trueflag
• MirrorcreateConnection’s guard against declarative config: block whenenv.CONFIG_PATHis setLocations:
- packages/web/src/actions.ts (lines 794–921) – update the
sew(...)wrapper and insert theCONFIG_PATH !== undefinedcheck just after the feature-flag guard.Suggested diff:
-export const experimental_addGithubRepositoryByUrl = async (repositoryUrl: string, domain: string): Promise<{ connectionId: number } | ServiceError> => sew(() => - withAuth((userId) => - withOrgMembership(userId, domain, async ({ org }) => { +export const experimental_addGithubRepositoryByUrl = async (repositoryUrl: string, domain: string): Promise<{ connectionId: number } | ServiceError> => sew(() => + withAuth((userId) => + withOrgMembership(userId, domain, async ({ org }) => { if (env.EXPERIMENT_SELF_SERVE_REPO_INDEXING_ENABLED !== 'true') { @@ } + // Block when using declarative config, like createConnection + if (env.CONFIG_PATH !== undefined) { + return { + statusCode: StatusCodes.BAD_REQUEST, + errorCode: ErrorCode.CONNECTION_CONFIG_PATH_SET, + message: "A configuration file is provided; cannot add connections via web.", + } satisfies ServiceError; + } @@ - }, OrgRole.GUEST), /* allowAnonymousAccess = */ true + }, OrgRole.MEMBER), /* allowAnonymousAccess = */ false ));If you need Guest-level or anonymous access for experimentation, please document the abuse controls (rate limits, logging, etc.) around this endpoint.
packages/web/src/app/[domain]/repos/components/addRepositoryDialog.tsx (3)
53-71: Add try/catch around server action to handle network/runtime failures gracefullyIn rare cases (network issue, transport failure), the call can throw rather than return a ServiceError. Wrap in try/catch to toast a user-friendly error.
- const onSubmit = async (data: z.infer<typeof formSchema>) => { - - const result = await experimental_addGithubRepositoryByUrl(data.repositoryUrl.trim(), domain); - if (isServiceError(result)) { - toast({ - title: "Error adding repository", - description: result.message, - variant: "destructive", - }); - } else { - toast({ - title: "Repository added successfully!", - description: "It will be indexed shortly.", - }); - form.reset(); - onOpenChange(false); - router.refresh(); - } - }; + const onSubmit = async (data: z.infer<typeof formSchema>) => { + try { + const result = await experimental_addGithubRepositoryByUrl(data.repositoryUrl.trim(), domain); + if (isServiceError(result)) { + toast({ + title: "Error adding repository", + description: result.message, + variant: "destructive", + }); + return; + } + toast({ + title: "Repository added successfully!", + description: "It will be indexed shortly.", + }); + form.reset(); + onOpenChange(false); + router.refresh(); + } catch (e: any) { + toast({ + title: "Unexpected error", + description: e?.message ?? "Please try again later.", + variant: "destructive", + }); + } + };
97-103: UX nit: autofocus the URL field when the dialog opensMinor improvement for speed: focus the input on open so users can paste immediately.
-<Input +<Input {...field} placeholder="https://github.com/user/project" disabled={isSubmitting} + autoFocus />
1-129: Consider centralizing GitHub URL parsing/validation to a shared utilYou currently maintain similar-but-different regexes in client and server. Extracting a shared function prevents drift and reduces maintenance.
Example (new util, language-only sketch outside this file):
// packages/web/src/lib/githubUrl.ts export type RepoRef = { owner: string; repo: string }; export function tryParseGithubRepo(input: string): RepoRef | null { const url = input.trim(); const patterns: RegExp[] = [ /^https?:\/\/(?:www\.)?github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, /^(?:www\.)?github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, /^([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+)\/?$/, /^git@github\.com:([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?$/, /^ssh:\/\/git@github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, ]; for (const p of patterns) { const m = url.match(p); if (m) return { owner: m[1], repo: m[2] }; } return null; }Then import and reuse in both action and dialog.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
packages/web/src/actions.ts(2 hunks)packages/web/src/app/[domain]/repos/addRepoButton.tsx(0 hunks)packages/web/src/app/[domain]/repos/columns.tsx(1 hunks)packages/web/src/app/[domain]/repos/components/addRepositoryDialog.tsx(1 hunks)packages/web/src/app/[domain]/repos/page.tsx(2 hunks)packages/web/src/app/[domain]/repos/repositoryTable.tsx(3 hunks)packages/web/src/components/ui/data-table.tsx(2 hunks)packages/web/src/env.mjs(1 hunks)
💤 Files with no reviewable changes (1)
- packages/web/src/app/[domain]/repos/addRepoButton.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*
📄 CodeRabbit Inference Engine (.cursor/rules/style.mdc)
Filenames should always be camelCase. Exception: if there are filenames in the same directory with a format other than camelCase, use that format to keep things consistent.
Files:
packages/web/src/env.mjspackages/web/src/app/[domain]/repos/page.tsxpackages/web/src/app/[domain]/repos/components/addRepositoryDialog.tsxpackages/web/src/app/[domain]/repos/columns.tsxpackages/web/src/actions.tspackages/web/src/app/[domain]/repos/repositoryTable.tsxpackages/web/src/components/ui/data-table.tsx
🧬 Code Graph Analysis (4)
packages/web/src/app/[domain]/repos/page.tsx (2)
packages/web/src/app/[domain]/repos/repositoryTable.tsx (1)
RepositoryTable(22-144)packages/web/src/env.mjs (2)
env(14-167)env(14-167)
packages/web/src/app/[domain]/repos/components/addRepositoryDialog.tsx (6)
packages/web/src/components/hooks/use-toast.ts (2)
useToast(194-194)toast(194-194)packages/web/src/actions.ts (1)
experimental_addGithubRepositoryByUrl(794-920)packages/web/src/components/ui/dialog.tsx (6)
Dialog(114-114)DialogContent(119-119)DialogHeader(120-120)DialogTitle(122-122)DialogDescription(123-123)DialogFooter(121-121)packages/web/src/components/ui/form.tsx (5)
Form(171-171)FormItem(172-172)FormLabel(173-173)FormControl(174-174)FormMessage(176-176)packages/web/src/components/ui/input.tsx (1)
Input(22-22)packages/web/src/components/ui/button.tsx (1)
Button(56-56)
packages/web/src/actions.ts (3)
packages/web/src/lib/serviceError.ts (1)
ServiceError(11-11)packages/web/src/env.mjs (2)
env(14-167)env(14-167)packages/schemas/src/v3/connection.type.ts (1)
GithubConnectionConfig(11-86)
packages/web/src/app/[domain]/repos/repositoryTable.tsx (2)
packages/web/src/components/ui/data-table.tsx (1)
DataTable(34-145)packages/web/src/app/[domain]/repos/components/addRepositoryDialog.tsx (1)
AddRepositoryDialog(39-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (8)
packages/web/src/env.mjs (1)
135-137: Server-only experiment flags approved: names consistent and no client-side usage detected
- ENV definitions in packages/web/src/env.mjs (lines 135–137) match all usages.
- References in packages/web/src/actions.ts (around lines 797, 844, 900–902) and page.tsx are correct.
- No
"use client"components reference EXPERIMENT_SELF_SERVE_REPO_INDEXING_GITHUB_TOKEN.packages/web/src/app/[domain]/repos/columns.tsx (1)
99-99: Header simplification LGTM.Switching to a static 'Repository' header is appropriate given the action button has moved to DataTable headerActions.
packages/web/src/components/ui/data-table.tsx (1)
30-32: Good abstraction: headerActions prop cleanly decouples table UI from actions.This keeps the table generic and re-usable while allowing pages to inject their own header controls.
Also applies to: 39-41
packages/web/src/app/[domain]/repos/page.tsx (1)
5-5: Server-gated visibility is wired correctly.Importing env on the server and serializing a boolean prop down to the client is the right approach. No exposure of server-only vars to the client.
Also applies to: 20-22
packages/web/src/app/[domain]/repos/repositoryTable.tsx (3)
58-69: Confirm intended priority for NEW vs IN_INDEX_QUEUE.In columns.tsx, NEW is labeled "Queued", but here NEW falls into the default branch (lowest priority), while IN_INDEX_QUEUE is highest. If the product intent is to treat NEW as queued, consider giving it the same priority as IN_INDEX_QUEUE.
If desired, adjust as follows:
- switch (status) { - case RepoIndexingStatus.IN_INDEX_QUEUE: + switch (status) { + case RepoIndexingStatus.NEW: + case RepoIndexingStatus.IN_INDEX_QUEUE: case RepoIndexingStatus.INDEXING: return 0 // Highest priority - currently indexing
18-24: Prop surface area addition looks good.Explicit props for feature-gated header actions keep RepositoryTable focused and testable.
120-141: Header action + dialog integration is cohesive and straightforward.Using DataTable.headerActions with a simple "Add repository" button that toggles AddRepositoryDialog is clean and aligns with the new abstraction.
packages/web/src/actions.ts (1)
25-25: Octokit dependency declared and available at runtimeConfirmed that
octokitis listed underdependenciesinpackages/web/package.json(version ^4.1.3). It will be bundled with the workspace build and available in the server runtime where this action executes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
packages/web/src/actions.ts (2)
794-803: Env flag comparison bug: compare a boolean, not a stringenv.EXPERIMENT_SELF_SERVE_REPO_INDEXING_ENABLED is a boolean (per env.mjs). Comparing to 'true' disables the feature unintentionally.
- if (env.EXPERIMENT_SELF_SERVE_REPO_INDEXING_ENABLED !== 'true') { + if (!env.EXPERIMENT_SELF_SERVE_REPO_INDEXING_ENABLED) { return { statusCode: StatusCodes.BAD_REQUEST, errorCode: ErrorCode.INVALID_REQUEST_BODY, message: "This feature is not enabled.", } satisfies ServiceError; }
847-876: Enforce “public-only” policy after successful fetchWith a token that can see private repos, octokit.repos.get will succeed for private repositories. Explicitly reject private repos to match the UX and policy.
const response = await octokit.rest.repos.get({ owner, repo, }); githubRepo = response.data; + // Enforce public-only policy + if ((githubRepo as any).private || (githubRepo as any).visibility === 'private') { + return { + statusCode: StatusCodes.BAD_REQUEST, + errorCode: ErrorCode.INVALID_REQUEST_BODY, + message: "Only public repositories can be added.", + } satisfies ServiceError; + } } catch (error) { if (isHttpError(error, 404)) { return { statusCode: StatusCodes.NOT_FOUND, errorCode: ErrorCode.INVALID_REQUEST_BODY, message: `Repository '${owner}/${repo}' not found or is private. Only public repositories can be added.`, } satisfies ServiceError; } if (isHttpError(error, 403)) { return { statusCode: StatusCodes.FORBIDDEN, errorCode: ErrorCode.INVALID_REQUEST_BODY, message: `Access to repository '${owner}/${repo}' is forbidden. Only public repositories can be added.`, } satisfies ServiceError; } + if (isHttpError(error, 401)) { + return { + statusCode: StatusCodes.UNAUTHORIZED, + errorCode: ErrorCode.INVALID_REQUEST_BODY, + message: "Invalid or missing GitHub credentials.", + } satisfies ServiceError; + }
🧹 Nitpick comments (3)
packages/web/src/lib/utils.ts (2)
463-464: Prefer platform-based Mac detection to avoid UA reduction pitfallsnavigator.userAgent is being reduced/frozen across browsers; matching "Mac OS X" can be brittle. Use navigator.platform (or fall back to UA) for better reliability.
Apply:
-export const IS_MAC = typeof navigator !== 'undefined' && /Mac OS X/.test(navigator.userAgent); +export const IS_MAC = + typeof navigator !== 'undefined' && + ( + (navigator.platform && navigator.platform.toLowerCase().includes('mac')) || + /mac os x/i.test(navigator.userAgent) + );
466-471: Tighten type check to ensure numeric status in isHttpErrorOctokit errors expose a numeric status. Guarding on typeof status === 'number' avoids false positives when random objects contain a string status.
-export const isHttpError = (error: unknown, status: number): boolean => { - return error !== null - && typeof error === 'object' - && 'status' in error - && error.status === status; -} +export const isHttpError = (error: unknown, status: number): boolean => { + return ( + error !== null && + typeof error === 'object' && + 'status' in error && + typeof (error as any).status === 'number' && + (error as any).status === status + ); +}packages/web/src/actions.ts (1)
805-831: Broaden URL parsing: support www and SSH formatsUsers commonly paste URLs like https://www.github.com/owner/repo and git@github.com:owner/repo(.git). Consider expanding the patterns for a smoother UX.
- const patterns = [ - // https://github.com/owner/repo or https://github.com/owner/repo.git - /^https?:\/\/github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, + const patterns = [ + // https://github.com/owner/repo or https://www.github.com/owner/repo(.git) + /^https?:\/\/(?:www\.)?github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, // github.com/owner/repo /^github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, + // SSH: git@github.com:owner/repo(.git) + /^git@github\.com:([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?$/, + // SSH URL form: ssh://git@github.com/owner/repo(.git) + /^ssh:\/\/git@github\.com\/([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+?)(?:\.git)?\/?$/, // owner/repo /^([a-zA-Z0-9_.-]+)\/([a-zA-Z0-9_.-]+)$/ ];
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/web/src/actions.ts(3 hunks)packages/web/src/features/chat/components/chatThread/detailsCard.tsx(1 hunks)packages/web/src/lib/utils.ts(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/web/src/features/chat/components/chatThread/detailsCard.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*
📄 CodeRabbit Inference Engine (.cursor/rules/style.mdc)
Filenames should always be camelCase. Exception: if there are filenames in the same directory with a format other than camelCase, use that format to keep things consistent.
Files:
packages/web/src/lib/utils.tspackages/web/src/actions.ts
🧬 Code Graph Analysis (1)
packages/web/src/actions.ts (5)
packages/web/src/lib/serviceError.ts (1)
ServiceError(11-11)packages/web/src/env.mjs (2)
env(14-167)env(14-167)packages/web/src/lib/utils.ts (1)
isHttpError(466-471)packages/web/src/app/error.tsx (1)
Error(14-54)packages/schemas/src/v3/connection.type.ts (1)
GithubConnectionConfig(11-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (2)
packages/web/src/actions.ts (2)
6-6: LGTM: imports now include isHttpErrorImporting isHttpError here aligns with the new Octokit error handling below.
25-25: LGTM: Octokit importCorrect import and placement for the new GitHub integration.
Adds a self-serve modal for adding public GitHub repositories by URL:
Enabled with
EXPERIMENT_SELF_SERVE_REPO_INDEXING_ENABLEDSummary by CodeRabbit
New Features
Improvements
Removals
Chores