Skip to content

fix: restore PR description parsing in automated triage script#8531

Closed
hoteye wants to merge 25 commits intogoogle-gemini:mainfrom
hoteye:hotfix/pr-triage-regression
Closed

fix: restore PR description parsing in automated triage script#8531
hoteye wants to merge 25 commits intogoogle-gemini:mainfrom
hoteye:hotfix/pr-triage-regression

Conversation

@hoteye
Copy link
Contributor

@hoteye hoteye commented Sep 16, 2025

Fixes #8522

This commit restores comprehensive PR description parsing that was accidentally removed in PR #7698. The triage script now properly detects issue references in PR descriptions using multiple patterns:

  1. Direct references like "fix .env check #123"
  2. Keyword patterns like "Closes fix .env check #123", "Fixes Update docs and tool description for read-many-files. #456", "Resolves keyboard shortcut to abort partial prompt #789"
  3. Fallback to GitHub's closingIssuesReferences API as last resort

Without this fix, ALL PRs were incorrectly labeled with status/need-issue because the script could only detect GitHub's automatically detected closing references, missing many valid manual references in descriptions.

The regression was introduced when PR description parsing logic was removed, leaving only the GitHub API fallback which has limited coverage.

TLDR

Dive Deeper

Reviewer Test Plan

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

@hoteye hoteye requested a review from a team as a code owner September 16, 2025 03:05
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @hoteye, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the automated PR triage script that was preventing accurate detection of linked issues from pull request descriptions. By reintroducing robust parsing logic, the script can now correctly identify issue references using various patterns, ensuring that PRs are appropriately labeled and integrated into the workflow, thereby improving automation reliability.

Highlights

  • Issue Reference Parsing Restored: The automated PR triage script now correctly parses issue references from PR descriptions, fixing a regression that caused all PRs to be incorrectly labeled with status/need-issue.
  • Multi-Pattern Detection: The script implements a comprehensive approach to detect issue references, including direct #123 patterns, keyword patterns like Closes #123, Fixes #456, and Resolves #789, and a fallback to GitHub's closingIssuesReferences API.
  • Regression Fix: This change specifically addresses a regression introduced in PR #7698, which had inadvertently removed the detailed PR description parsing logic, leaving only a limited API fallback.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restores the parsing of PR descriptions to find linked issues, which is a great improvement. However, the current implementation has a logical flaw in the order it checks for issue references. It prioritizes generic references over explicit closing keywords, which can lead to incorrect issue association. I've provided a suggestion to reorder the checks to be more robust, prioritizing the most specific patterns first. This will ensure the triage script behaves correctly in more cases.

@hoteye
Copy link
Contributor Author

hoteye commented Sep 17, 2025

Hi @srithreepo , could we consider increasing the timeout for this? Let me know your thoughts!

@hoteye
Copy link
Contributor Author

hoteye commented Sep 17, 2025

Hi @srithreepo ,

We've confirmed through testing that the current PR triage script is completely broken.

Root Cause Verified

The official PR triage script at .github/scripts/pr-triage.sh tries to use a non-existent API field:

gh pr view "${PR_NUMBER}" --json closingIssuesReferences -q '.closingIssuesReferences.nodes[0].number'

But closingIssuesReferences doesn't exist in the GitHub CLI API:

$ gh pr view 8531 --repo google-gemini/gemini-cli --json closingIssuesReferences
Unknown JSON field: "closingIssuesReferences"
Available fields: additions, assignees, author, body, ... (no closingIssuesReferences)

System-wide Impact

This means every single PR gets the status/need-issue label because:

  1. The API call always fails (field doesn't exist)
  2. ISSUE_NUMBER is always empty
  3. Script assumes no issue link exists

Even PRs with perfect Closes #123 format are being mislabeled.

Solution Verified

PR #8531 fixes this by adding proper text parsing as Pattern 1:

  • Detects Fixes #123, Closes #456, Resolves #789 patterns
  • Removes dependency on the broken API field
  • Tested and confirmed working on real GitHub data

Test Results

# Testing PR #8531 (contains "Fixes #8522")
Processing PR #8531
Found linked issue #8522  <- WORKS!
needs_comment=false

# Testing PR #8529 (no issue reference)  
Processing PR #8529
No linked issue found for PR #8529  <- CORRECTLY DETECTED!

The current system-wide failure of issue detection confirms this fix is critical for the triage automation to work at all.

Priority: This should be merged ASAP to restore basic PR triage functionality across the entire repository.

@hoteye hoteye force-pushed the hotfix/pr-triage-regression branch from ced0f2f to 425cbb2 Compare September 17, 2025 02:02
@hoteye hoteye requested a review from srithreepo September 17, 2025 08:39
Fixes google-gemini#8522

This commit restores comprehensive PR description parsing that was
accidentally removed in PR google-gemini#7698. The triage script now properly
detects issue references in PR descriptions using multiple patterns:

1. Direct references like "google-gemini#123"
2. Keyword patterns like "Closes google-gemini#123", "Fixes google-gemini#456", "Resolves google-gemini#789"
3. Fallback to GitHub's closingIssuesReferences API as last resort

Without this fix, ALL PRs were incorrectly labeled with status/need-issue
because the script could only detect GitHub's automatically detected
closing references, missing many valid manual references in descriptions.

The regression was introduced when PR description parsing logic was
removed, leaving only the GitHub API fallback which has limited coverage.
Reorders pattern matching from most specific to least specific:

1. Keyword patterns (Closes/Fixes/Resolves google-gemini#123) - MOST SPECIFIC
2. GitHub API closingIssuesReferences - MEDIUM SPECIFICITY
3. Direct references (google-gemini#123) - LEAST SPECIFIC (fallback only)

This prevents incorrect matching when PR descriptions contain both
casual references and explicit closing statements.

Example improvement:
  PR body: "Related to google-gemini#100. Closes google-gemini#200."
  Before: Would incorrectly match google-gemini#100
  After:  Correctly matches google-gemini#200

Addresses code review feedback from @gemini-code-assist.
Remove Pattern 3 (direct google-gemini#123 references) as suggested by @srithreepo
to avoid false positives where issue numbers are mentioned but not
actually being resolved.

Now only uses:
- Pattern 1: Explicit closes/fixes/resolves keywords (most reliable)
- Pattern 2: GitHub's closingIssuesReferences API (fallback)

This prevents adding status/need-issue label when PRs mention
issues like 'Issue #xyz introduced this bug' while still catching
legitimate issue links.
…-gemini#8531)

The test 'should return an error if the file size exceeds 20MB' was
timing out in CI due to the time required to create and process a
21MB test file.

- Increase timeout from default 5000ms to 15000ms for this specific test
- This allows sufficient time for Buffer.alloc() and file operations
- Resolves CI timeout failure in GitHub Actions

Addresses CI failure in PR google-gemini#8531 job 50540837510.
- Remove non-existent closingIssuesReferences field query
- Replace with direct google-gemini#123 pattern matching as Pattern 2
- Improve script reliability and reduce API calls
@hoteye hoteye force-pushed the hotfix/pr-triage-regression branch from 5ab14ea to 07a1c1d Compare September 17, 2025 09:10
…itives

Per feedback from @srithreepo, remove direct google-gemini#123 pattern matching
to avoid incorrectly detecting issue references in cases like:
'Issue google-gemini#123 introduced this bug' which should still get needs-issue label.

Now only detects explicit closing keywords:
- Closes google-gemini#123
- Fixes google-gemini#456
- Resolves google-gemini#789
@hoteye
Copy link
Contributor Author

hoteye commented Sep 17, 2025

@srithreepo

Thanks for the feedback!

I've updated the script to only keep the explicit keyword pattern:

  • Matches: Closes #123, Fixes #456, Resolves #789 (case-insensitive)
  • Removed: Direct #123 pattern to avoid cases like "Issue fix .env check #123 introduced this bug"
  • Removed: Invalid closingIssuesReferences API call that was causing the original issue

Root Cause Confirmed

The original script failure was caused by trying to use a non-existent GitHub CLI field:

$ gh pr view 8531 --repo google-gemini/gemini-cli --json closingIssuesReferences
Unknown JSON field: "closingIssuesReferences"

This API call always failed, causing every PR to get the status/need-issue label.

Current Strategy

The script now uses only one reliable text-based detection pattern:

grep -iE '(closes?|fixes?|resolves?) #[0-9]+'

This should solve the widespread status/need-issue labeling issue while avoiding false positives you mentioned.

Testing & Iteration

Question: Once this is merged, will we be able to see the effect immediately on existing PRs with the triage automation?

If so, we can observe the results and adjust the strategy further if needed. For example, we could potentially add more keywords like "Addresses" or "Resolves" variations if we find legitimate cases being missed.

The current approach prioritizes precision over recall - better to occasionally miss an edge case than to incorrectly remove status/need-issue labels from PRs that actually need them.

hoteye and others added 2 commits September 18, 2025 04:39
- Resolve conflict in fileUtils.test.ts by removing timeout comment
- Integrate latest changes from main branch
- Keep hotfix changes while adopting upstream improvements
@hoteye
Copy link
Contributor Author

hoteye commented Sep 17, 2025

Hi @srithreepo , file processing timeout comment has been removed, merge conflict resolved and pushed. Please review.

Copy link
Contributor Author

@hoteye hoteye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the script to only keep the explicit keyword pattern:

Matches: Closes #123, Fixes #456, Resolves #789 (case-insensitive)
Removed: Direct #123 pattern to avoid cases like "Issue #123 introduced this bug"
Removed: Invalid closingIssuesReferences API call that was causing the original issue

@hoteye
Copy link
Contributor Author

hoteye commented Sep 19, 2025

Hi @srithreepo , @jacob314 , @jakemac53 The bug in the automated triage script is affecting all PRs—every new PR is incorrectly being labeled with status/need-issue. This issue has been ongoing for a while and is impacting label accuracy and the maintenance workflow. If it's difficult to review this fix soon, could any available maintainer please help review or process it? Thank you very much!

Relevant discussion: #8531

hoteye and others added 3 commits September 20, 2025 05:43
- Support full GitHub issue URIs in addition to #<number> format
- Maintains backward compatibility with keyword + #number priority
- Falls back to URI detection when no keyword-based references found
- Addresses feedback from jakemac53 in PR review comments

Examples of supported formats:
- Closes google-gemini#123 (existing)
- https://github.com/owner/repo/issues/456 (new)
@hoteye hoteye requested a review from jakemac53 September 19, 2025 22:18
hoteye and others added 3 commits September 23, 2025 08:30
Replace echo with printf to safely handle PR body content and prevent
potential command injection vulnerabilities similar to recent supply
chain attacks. The printf approach treats input as literal string data
without interpreting shell metacharacters or command substitutions.

Addresses security concern raised in code review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: Automated PR triage system incorrectly labeling all PRs with status/need-issue keyboard shortcut to abort partial prompt

4 participants