fix: restore PR description parsing in automated triage script#8531
fix: restore PR description parsing in automated triage script#8531hoteye wants to merge 25 commits intogoogle-gemini:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @hoteye, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves a critical bug in the automated PR triage script that was preventing accurate detection of linked issues from pull request descriptions. By reintroducing robust parsing logic, the script can now correctly identify issue references using various patterns, ensuring that PRs are appropriately labeled and integrated into the workflow, thereby improving automation reliability.
Highlights
- Issue Reference Parsing Restored: The automated PR triage script now correctly parses issue references from PR descriptions, fixing a regression that caused all PRs to be incorrectly labeled with
status/need-issue. - Multi-Pattern Detection: The script implements a comprehensive approach to detect issue references, including direct
#123patterns, keyword patterns likeCloses #123,Fixes #456, andResolves #789, and a fallback to GitHub'sclosingIssuesReferencesAPI. - Regression Fix: This change specifically addresses a regression introduced in PR #7698, which had inadvertently removed the detailed PR description parsing logic, leaving only a limited API fallback.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request restores the parsing of PR descriptions to find linked issues, which is a great improvement. However, the current implementation has a logical flaw in the order it checks for issue references. It prioritizes generic references over explicit closing keywords, which can lead to incorrect issue association. I've provided a suggestion to reorder the checks to be more robust, prioritizing the most specific patterns first. This will ensure the triage script behaves correctly in more cases.
|
Hi @srithreepo , could we consider increasing the timeout for this? Let me know your thoughts! |
|
Hi @srithreepo , We've confirmed through testing that the current PR triage script is completely broken. Root Cause VerifiedThe official PR triage script at gh pr view "${PR_NUMBER}" --json closingIssuesReferences -q '.closingIssuesReferences.nodes[0].number'But $ gh pr view 8531 --repo google-gemini/gemini-cli --json closingIssuesReferences
Unknown JSON field: "closingIssuesReferences"
Available fields: additions, assignees, author, body, ... (no closingIssuesReferences)System-wide ImpactThis means every single PR gets the
Even PRs with perfect Solution VerifiedPR #8531 fixes this by adding proper text parsing as Pattern 1:
Test Results# Testing PR #8531 (contains "Fixes #8522")
Processing PR #8531
Found linked issue #8522 <- WORKS!
needs_comment=false
# Testing PR #8529 (no issue reference)
Processing PR #8529
No linked issue found for PR #8529 <- CORRECTLY DETECTED!The current system-wide failure of issue detection confirms this fix is critical for the triage automation to work at all. Priority: This should be merged ASAP to restore basic PR triage functionality across the entire repository. |
ced0f2f to
425cbb2
Compare
Fixes google-gemini#8522 This commit restores comprehensive PR description parsing that was accidentally removed in PR google-gemini#7698. The triage script now properly detects issue references in PR descriptions using multiple patterns: 1. Direct references like "google-gemini#123" 2. Keyword patterns like "Closes google-gemini#123", "Fixes google-gemini#456", "Resolves google-gemini#789" 3. Fallback to GitHub's closingIssuesReferences API as last resort Without this fix, ALL PRs were incorrectly labeled with status/need-issue because the script could only detect GitHub's automatically detected closing references, missing many valid manual references in descriptions. The regression was introduced when PR description parsing logic was removed, leaving only the GitHub API fallback which has limited coverage.
Reorders pattern matching from most specific to least specific: 1. Keyword patterns (Closes/Fixes/Resolves google-gemini#123) - MOST SPECIFIC 2. GitHub API closingIssuesReferences - MEDIUM SPECIFICITY 3. Direct references (google-gemini#123) - LEAST SPECIFIC (fallback only) This prevents incorrect matching when PR descriptions contain both casual references and explicit closing statements. Example improvement: PR body: "Related to google-gemini#100. Closes google-gemini#200." Before: Would incorrectly match google-gemini#100 After: Correctly matches google-gemini#200 Addresses code review feedback from @gemini-code-assist.
Remove Pattern 3 (direct google-gemini#123 references) as suggested by @srithreepo to avoid false positives where issue numbers are mentioned but not actually being resolved. Now only uses: - Pattern 1: Explicit closes/fixes/resolves keywords (most reliable) - Pattern 2: GitHub's closingIssuesReferences API (fallback) This prevents adding status/need-issue label when PRs mention issues like 'Issue #xyz introduced this bug' while still catching legitimate issue links.
…-gemini#8531) The test 'should return an error if the file size exceeds 20MB' was timing out in CI due to the time required to create and process a 21MB test file. - Increase timeout from default 5000ms to 15000ms for this specific test - This allows sufficient time for Buffer.alloc() and file operations - Resolves CI timeout failure in GitHub Actions Addresses CI failure in PR google-gemini#8531 job 50540837510.
- Remove non-existent closingIssuesReferences field query - Replace with direct google-gemini#123 pattern matching as Pattern 2 - Improve script reliability and reduce API calls
5ab14ea to
07a1c1d
Compare
…itives Per feedback from @srithreepo, remove direct google-gemini#123 pattern matching to avoid incorrectly detecting issue references in cases like: 'Issue google-gemini#123 introduced this bug' which should still get needs-issue label. Now only detects explicit closing keywords: - Closes google-gemini#123 - Fixes google-gemini#456 - Resolves google-gemini#789
|
Thanks for the feedback! I've updated the script to only keep the explicit keyword pattern:
Root Cause ConfirmedThe original script failure was caused by trying to use a non-existent GitHub CLI field: $ gh pr view 8531 --repo google-gemini/gemini-cli --json closingIssuesReferences
Unknown JSON field: "closingIssuesReferences"This API call always failed, causing every PR to get the Current StrategyThe script now uses only one reliable text-based detection pattern: grep -iE '(closes?|fixes?|resolves?) #[0-9]+'This should solve the widespread Testing & IterationQuestion: Once this is merged, will we be able to see the effect immediately on existing PRs with the triage automation? If so, we can observe the results and adjust the strategy further if needed. For example, we could potentially add more keywords like "Addresses" or "Resolves" variations if we find legitimate cases being missed. The current approach prioritizes precision over recall - better to occasionally miss an edge case than to incorrectly remove |
- Resolve conflict in fileUtils.test.ts by removing timeout comment - Integrate latest changes from main branch - Keep hotfix changes while adopting upstream improvements
|
Hi @srithreepo , file processing timeout comment has been removed, merge conflict resolved and pushed. Please review. |
hoteye
left a comment
There was a problem hiding this comment.
I've updated the script to only keep the explicit keyword pattern:
Matches: Closes #123, Fixes #456, Resolves #789 (case-insensitive)
Removed: Direct #123 pattern to avoid cases like "Issue #123 introduced this bug"
Removed: Invalid closingIssuesReferences API call that was causing the original issue
|
Hi @srithreepo , @jacob314 , @jakemac53 The bug in the automated triage script is affecting all PRs—every new PR is incorrectly being labeled with status/need-issue. This issue has been ongoing for a while and is impacting label accuracy and the maintenance workflow. If it's difficult to review this fix soon, could any available maintainer please help review or process it? Thank you very much! Relevant discussion: #8531 |
- Support full GitHub issue URIs in addition to #<number> format - Maintains backward compatibility with keyword + #number priority - Falls back to URI detection when no keyword-based references found - Addresses feedback from jakemac53 in PR review comments Examples of supported formats: - Closes google-gemini#123 (existing) - https://github.com/owner/repo/issues/456 (new)
Replace echo with printf to safely handle PR body content and prevent potential command injection vulnerabilities similar to recent supply chain attacks. The printf approach treats input as literal string data without interpreting shell metacharacters or command substitutions. Addresses security concern raised in code review.
Fixes #8522
This commit restores comprehensive PR description parsing that was accidentally removed in PR #7698. The triage script now properly detects issue references in PR descriptions using multiple patterns:
Without this fix, ALL PRs were incorrectly labeled with status/need-issue because the script could only detect GitHub's automatically detected closing references, missing many valid manual references in descriptions.
The regression was introduced when PR description parsing logic was removed, leaving only the GitHub API fallback which has limited coverage.
TLDR
Dive Deeper
Reviewer Test Plan
Testing Matrix
Linked issues / bugs