Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 28 additions & 20 deletions .github/workflows/chroma-issue-indexer.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 30 additions & 22 deletions .github/workflows/chroma-issue-indexer.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,37 +26,45 @@ This workflow indexes issues from the repository into a Chroma vector database f

Index the 100 most recent issues from the repository into the Chroma vector database:

1. **Fetch Issues**:
- Use the GitHub MCP server tools to list the most recent 100 issues
- Include both open and closed issues
- Get issue number, title, body, state, created date, and author

2. **Create/Update Chroma Collection**:
- Create a collection named "issues" if it doesn't exist (use `chroma_create_collection`)
- Use an appropriate embedding function for semantic search

3. **Index Issues**:
- For each issue, add it to the Chroma collection (use `chroma_add_documents`)
- Use ID format: `issue-{issue_number}`
- Document content should be: `{title}\n\n{body}` (title and body combined)
- Include metadata:
- `number`: Issue number
1. **Create Chroma Collection First**:
- IMPORTANT: Check if the "issues" collection exists using `chroma_list_collections`
- If it doesn't exist, create it using `chroma_create_collection` with:
- Collection name: "issues"
- Use default embedding function (omit embedding_function_name parameter)

2. **Fetch Issues Using GitHub MCP Tools** (NOT Python scripts):
- Use the `list_issues` tool from GitHub MCP server to fetch issues
- Fetch issues in batches of 5 at a time using the `perPage: 5` parameter
- Start with page 1, then page 2, page 3, etc. until you have 100 issues total
- Include both open and closed issues (omit state parameter to get both)
- Order by created date descending to get most recent first: `orderBy: "CREATED_AT"`, `direction: "DESC"`
- For each issue, extract: number, title, body, state, createdAt, author.login, url

3. **Index Issues in Batches**:
- Process each batch of 5 issues immediately after fetching
- For each batch, use `chroma_add_documents` to add all 5 issues at once
- Use ID format: `issue-{issue_number}` (e.g., "issue-123")
- Document content: `{title}\n\n{body}` (combine title and body)
- If body is empty/null, use just the title as content
- Include metadata for each issue:
- `number`: Issue number (as string)
- `title`: Issue title
- `state`: Issue state (OPEN or CLOSED)
- `author`: Issue author username
- `created_at`: Issue creation date
- `created_at`: Issue creation date (ISO 8601 format)
- `url`: Issue URL

4. **Report Progress**:
- Log how many issues were indexed
- Note any issues that couldn't be indexed (e.g., empty body)
- Report the total number of issues in the collection
- After processing all batches, use `chroma_get_collection_count` to get total issue count
- Report how many issues were successfully indexed
- Note any issues that couldn't be indexed (e.g., API errors)

## Important Notes

- Process issues in batches if needed to avoid rate limits
- Skip issues that have already been indexed (check if ID exists)
- For issues with empty bodies, use just the title as content
- **MUST use GitHub MCP tools** (`list_issues` tool), NOT Python scripts or `gh` CLI
- **MUST create collection first** before attempting to add documents
- Process exactly 5 issues per batch using `perPage: 5` and incrementing page number
- Skip duplicate issues (Chroma will update if ID exists)
- The collection persists in `/tmp/gh-aw/cache-memory-chroma/` across runs
- This helps other workflows search for similar issues using semantic search

Expand Down
23 changes: 21 additions & 2 deletions specs/artifacts.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.