-
-
Notifications
You must be signed in to change notification settings - Fork 4
fix(INFRA-3081): implement duplicate detection for all commit types #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add duplicate detection for both PR-based and direct commits - For PR commits: check if PR number already exists in changelog - For direct commits: check if description already exists in changelog - This fixes idempotency issues where direct commits like 'Update Attributions' were being re-added on every workflow run Fixes INFRA-3081 BREAKING CHANGE: None - this is backward compatible The behavior change is that direct commits (without PR numbers) are now included in changelogs by default, but duplicates are properly detected and excluded on subsequent runs. Future PR will add optional --require-pr-numbers flag for teams that want to filter out direct commits during generation.
e987c77 to
3d5da86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds duplicate detection for direct commits (commits without PR numbers) in changelog generation to prevent changelog entries from being duplicated across multiple workflow runs.
- Adds
loggedDescriptionstracking to capture all change descriptions already in the changelog - Updates duplicate detection logic to check descriptions for direct commits (no PR number)
- Replaces
assert.okwith standardthrow new Errorfor consistency
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/update-changelog.ts | Adds getAllLoggedDescriptions function and passes descriptions to getNewChangeEntries |
| src/get-new-changes.ts | Updates duplicate detection to check both PR numbers and descriptions; removes assert import; replaces assert.ok with throw new Error |
| src/get-new-changes.test.ts | Adds comprehensive test coverage for PR and direct commit deduplication scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
bugbot run |
- Replace substring matching (some + includes) with exact array lookup - Fixes false positive where 'Update' would match 'Update Attributions' - Improves performance by using direct array.includes() instead of some() Addresses Copilot feedback in PR review
|
Thanks @github-copilot. The logic was reversed and could cause false positives. Fixed in latest commit:
The duplicate detection now uses exact string matching for direct commits, consistent with PR number matching for PR-based commits. |
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Trim both new and logged descriptions before comparison - Prevents duplicates when commits have trailing/leading whitespace - Matches behavior of changelog parser's extractPrLinks which trims descriptions Example fix: - Raw commit: 'Update Attributions ' (with trailing space) - Logged: 'Update Attributions' (trimmed by parser) - Now correctly detected as duplicate after normalization Addresses Cursor bot feedback on whitespace mismatch
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
- Normalize descriptions once at source in getAllLoggedDescriptions() - Eliminates N×M trim operations in filter loop - For 100 commits × 500 logged = 50,000 operations reduced to 500 Before: trim() called on every comparison in inner loop After: trim() called once per logged description when building array Addresses Copilot performance feedback
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Prevents duplicates from squash merges where both original and merge commits appear - When a commit without PR number has a corresponding merge commit with same description, skip it - The merge commit (with PR number) is used instead Example scenario that's now fixed: 1. Feature branch commit: 'feat: Add dashboard' (no PR#) 2. Squash merge commit: 'feat: Add dashboard (#6)' (has PR#) Both have same description but different PR status - now only merge commit is used Fixes duplicate entry issue found in test repository: https://github.com/consensys-test/metamask-extension-test/blob/release/101.0.0-Changelog/CHANGELOG.md
8fe284f to
4e7a9cb
Compare
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
- Log all commits extracted by getCommits() - Log filter decisions (why commits are skipped) - Log final commits after filtering - Shows PR numbers, logged arrays, and Set contents This will help diagnose why valid commits are missing from changelog in test repository (PR #10 and Emergency security patch) Temporary debug commit - will be removed after investigation
- Change 'merge' to 'Merge pull request' in the list of keywords to indicate excluded commits - Ensures more accurate filtering of commit messages in the changelog generation process
- Eliminated debug logging related to commit extraction and filtering to clean up the codebase - Simplified the filtering logic by directly returning boolean expressions instead of using intermediate variables - Maintained functionality while improving code readability and maintainability
- Added a comment to clarify the use of process.env for the GitHub token - Maintained error handling for missing GITHUB_TOKEN to prevent runtime issues
- Updated logic to use the description instead of the subject for categorizing merge commits - Ensures accurate representation of the content from pull requests in changelog generation - Prevents exclusion of relevant information from merge commits with generic subjects
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gauthierpetetin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for adding these unit tests
mcmire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to get to this. I had some questions about the core logic here, as well as some suggestions for simplifying code and tests.
| 'flaky test', | ||
| 'INFRA-', | ||
| 'merge', | ||
| 'Merge pull request', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change belong in this PR? It seems separate from the problem this PR is solving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is related to the PR's goal because the original keyword 'merge' was too broad and would exclude any commit containing "merge" in the description (e.g., "feat: merge user profiles into dashboard"). Changing to 'Merge pull request' is more specific and only excludes actual GitHub merge commit messages.
That said, I can remove it if you prefer to keep the scope narrower. Let me know your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to make this change separately I think. There are other entries that look too broad as well. e.g. cp- would exclude any urgent changes that we cherry-pick (which certainly we want documented for the hotfix release).
The whole strategy is a bit questionable to be honest 🤔 It's fairly common for teams to merge functional changes under PRs with a title related to non-functional changes like e2e tests. Certainly not a good practice, but we don't do enough to detect or prevent it. But I guess that's out of scope for this PR, it's already done.
src/get-new-changes.test.ts
Outdated
| import * as runCommandModule from './run-command'; | ||
|
|
||
| // Mock the run-command module | ||
| jest.mock('./run-command'); | ||
|
|
||
| const mockRunCommand = runCommandModule.runCommand as jest.MockedFunction< | ||
| typeof runCommandModule.runCommand | ||
| >; | ||
| const mockRunCommandAndSplit = | ||
| runCommandModule.runCommandAndSplit as jest.MockedFunction< | ||
| typeof runCommandModule.runCommandAndSplit | ||
| >; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you import both of these functions explicitly at the top, I believe you can write this more simply:
| import * as runCommandModule from './run-command'; | |
| // Mock the run-command module | |
| jest.mock('./run-command'); | |
| const mockRunCommand = runCommandModule.runCommand as jest.MockedFunction< | |
| typeof runCommandModule.runCommand | |
| >; | |
| const mockRunCommandAndSplit = | |
| runCommandModule.runCommandAndSplit as jest.MockedFunction< | |
| typeof runCommandModule.runCommandAndSplit | |
| >; | |
| import { runCommand, runCommandAndSplit } from './run-command'; | |
| jest.mock('./run-command'); | |
| const mockRunCommand = jest.mocked(runCommand); | |
| const mockRunCommandAndSplit = jest.mocked(runCommandAndSplit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Switched to direct imports for consistency with the main source file and cleaner type assertions.
Note: I kept the MockedFunction type assertion because jest.mocked() was added in Jest 27.4.0, and our @types/jest (26.0.23) doesn't include the type definitions for it. Upgrading @types/jest to ≥27.4.0 could be a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, we're already on Jest v29 here, so the types must have been left outdated by mistake
src/get-new-changes.test.ts
Outdated
|
|
||
| describe('including commits with and without PR numbers', () => { | ||
| it('should include commits with PR numbers', async () => { | ||
| // Mock git rev-list to return commit hashes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for another PR: At some point I wonder if we ought to extract getCommits to its own file so that we don't have to mock both commands here and we can simplify these tests slightly. getCommits is very complex anyway and probably deserves its own tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! We can create some follow-up PR from this one.
| */ | ||
| function initOctoKit() { | ||
| if (!process.env.GITHUB_TOKEN) { | ||
| // eslint-disable-next-line node/no-process-env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for another PR: It might be good if we extracted this function to its own file so we could mock it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Follow-up item.
| * @returns All change descriptions included in the given changelog, trimmed. | ||
| */ | ||
| function getAllLoggedDescriptions(changelog: Changelog) { | ||
| return getAllChanges(changelog).map((change) => change.description.trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note on pre-trimming commits above so that we can simplify this here:
| return getAllChanges(changelog).map((change) => change.description.trim()); | |
| return getAllChanges(changelog).map((change) => change.description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept .trim() as defensive measure - getAllLoggedDescriptions() reads from the parsed changelog file (which may have manual edits or entries from older tool versions), not from getCommits(). The trim ensures consistent comparison regardless of source data whitespace.
Happy to remove if you prefer consistency with the "trim at source" principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Elliot was suggesting that we trim them earlier in the process, not stop trimming them in auto-changelog altogether. At least that was my read, maybe I'm mistaken.
Either way, this seems like something we can consider in a later PR.
- Introduced a new `Commit` type to encapsulate commit details, including PR number, subject, description, and merge commit status. - Implemented a `deduplicateCommits` function to streamline the filtering of duplicate commits based on PR numbers and descriptions. - Updated the `getCommits` function to handle merge commits more effectively and ensure accurate categorization in changelog generation. - Improved overall code readability and maintainability by consolidating duplicate detection logic.
@mcmire Thanks for the detailed review! Your suggestions have significantly improved the code quality. All feedback has been addressed. Let me know if anything needs further adjustment. |
Gudahtt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A couple of minor suggestions (moving the constant to a separate PR, and the one I just left), but no blockers
|
|
||
| describe('PR-tagged commits', () => { | ||
| it('should include commits with PR numbers', async () => { | ||
| mockRunCommandAndSplit.mockResolvedValueOnce(['commit1', 'commit2']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It's hard to tell what these mocks are intended to do, because both runCommandAndSplit and runCommand are used for different commands in the module under test.
We could improve this by using a mock implementation that validates expected params maybe?
Summary
Implements duplicate detection for changelog entries to prevent re-adding the same commits across multiple workflow runs.
Problem
The auto-changelog tool was creating duplicate entries when CI ran multiple times on a release branch. The existing deduplication only tracked PR numbers, so direct commits (without PR numbers) like "Update Attributions" or "chore: bump version" would be added repeatedly.
Solution
1. Dual-Tracking Duplicate Detection
2. Handle Squash Merge Duplicates
Squash merges create two commits with the same content:
feat: Add feature(no PR number)feat: Add feature (#123)(has PR number)The feature branch commit is now skipped when its corresponding merge commit exists.
3. Fix Merge Commit Categorization
Regular merge commits (format:
Merge pull request #123 from branch) were being excluded because categorization used the merge subject instead of the actual PR content. Now uses the extracted description from the commit body for proper categorization.Changes
loggedDescriptionsparameter to track all changelog entry descriptions'merge'→'Merge pull request'to avoid false exclusionsTesting
Verified with test repository
consensys-test/metamask-extension-testusing multiple release branches with:Results: ✅ No duplicates, all commits correctly categorized, merge commits properly handled
Related
Fixes INFRA-3081
Note
Add robust duplicate detection using PR numbers and exact descriptions (incl. squash/merge cases), update exclusions, and add comprehensive tests.
src/get-new-changes.ts):loggedDescriptionssupport and dedup logic for PR-based and direct commits, skipping direct commits when a matching PR merge exists.git showsubject is empty; refine GITHUB_TOKEN retrieval.src/update-changelog.ts):loggedDescriptionsfrom existing changelog and pass togetNewChangeEntries.src/constants.ts):mergetoMerge pull request.src/get-new-changes.test.tswith cases for PR vs direct commits, duplicate detection across runs, squash/merge handling, link formatting, and edge cases.Written by Cursor Bugbot for commit d4fea63. Configure here.