Skip to content

Conversation

@Qbandev
Copy link
Contributor

@Qbandev Qbandev commented Nov 7, 2025

Summary

Implements duplicate detection for changelog entries to prevent re-adding the same commits across multiple workflow runs.

Problem

The auto-changelog tool was creating duplicate entries when CI ran multiple times on a release branch. The existing deduplication only tracked PR numbers, so direct commits (without PR numbers) like "Update Attributions" or "chore: bump version" would be added repeatedly.

Solution

1. Dual-Tracking Duplicate Detection

  • PR-based commits: Check if PR number already exists in changelog
  • Direct commits: Check if exact description text already exists in changelog

2. Handle Squash Merge Duplicates

Squash merges create two commits with the same content:

  • Feature branch commit: feat: Add feature (no PR number)
  • Merge commit: feat: Add feature (#123) (has PR number)

The feature branch commit is now skipped when its corresponding merge commit exists.

3. Fix Merge Commit Categorization

Regular merge commits (format: Merge pull request #123 from branch) were being excluded because categorization used the merge subject instead of the actual PR content. Now uses the extracted description from the commit body for proper categorization.

Changes

  • Added loggedDescriptions parameter to track all changelog entry descriptions
  • Updated duplicate detection to check both PR numbers and descriptions
  • Pre-build Set of PR descriptions for O(1) lookup performance
  • Refined exclusion keyword 'merge''Merge pull request' to avoid false exclusions
  • Use description for categorizing merge commits instead of subject

Testing

Verified with test repository consensys-test/metamask-extension-test using multiple release branches with:

  • Regular merge PRs
  • Squash merge PRs
  • Direct commits (feat, fix, docs, chore)
  • Multiple workflow runs to test idempotency

Results: ✅ No duplicates, all commits correctly categorized, merge commits properly handled

Related

Fixes INFRA-3081


Note

Add robust duplicate detection using PR numbers and exact descriptions (incl. squash/merge cases), update exclusions, and add comprehensive tests.

  • Changelog generation (src/get-new-changes.ts):
    • Add loggedDescriptions support and dedup logic for PR-based and direct commits, skipping direct commits when a matching PR merge exists.
    • Use description (not subject) for merge-commit categorization; append PR link with optional short format.
    • Replace assertion with explicit error when git show subject is empty; refine GITHUB_TOKEN retrieval.
  • Changelog update (src/update-changelog.ts):
    • Extract trimmed loggedDescriptions from existing changelog and pass to getNewChangeEntries.
  • Exclusions (src/constants.ts):
    • Adjust exclusion keyword from merge to Merge pull request.
  • Tests:
    • Add src/get-new-changes.test.ts with cases for PR vs direct commits, duplicate detection across runs, squash/merge handling, link formatting, and edge cases.

Written by Cursor Bugbot for commit d4fea63. Configure here.

@Qbandev Qbandev requested a review from a team as a code owner November 7, 2025 11:11
- Add duplicate detection for both PR-based and direct commits
- For PR commits: check if PR number already exists in changelog
- For direct commits: check if description already exists in changelog
- This fixes idempotency issues where direct commits like 'Update Attributions'
  were being re-added on every workflow run

Fixes INFRA-3081

BREAKING CHANGE: None - this is backward compatible
The behavior change is that direct commits (without PR numbers) are now
included in changelogs by default, but duplicates are properly detected
and excluded on subsequent runs.

Future PR will add optional --require-pr-numbers flag for teams that want
to filter out direct commits during generation.
@Qbandev Qbandev force-pushed the fix/INFRA-3081-duplicate-detection branch from e987c77 to 3d5da86 Compare November 7, 2025 11:19
@Qbandev Qbandev requested a review from Copilot November 7, 2025 14:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds duplicate detection for direct commits (commits without PR numbers) in changelog generation to prevent changelog entries from being duplicated across multiple workflow runs.

  • Adds loggedDescriptions tracking to capture all change descriptions already in the changelog
  • Updates duplicate detection logic to check descriptions for direct commits (no PR number)
  • Replaces assert.ok with standard throw new Error for consistency

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/update-changelog.ts Adds getAllLoggedDescriptions function and passes descriptions to getNewChangeEntries
src/get-new-changes.ts Updates duplicate detection to check both PR numbers and descriptions; removes assert import; replaces assert.ok with throw new Error
src/get-new-changes.test.ts Adds comprehensive test coverage for PR and direct commit deduplication scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

- Replace substring matching (some + includes) with exact array lookup
- Fixes false positive where 'Update' would match 'Update Attributions'
- Improves performance by using direct array.includes() instead of some()

Addresses Copilot feedback in PR review
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

Thanks @github-copilot. The logic was reversed and could cause false positives.

Fixed in latest commit:

  • Changed from loggedDescriptions.some((loggedDesc) => loggedDesc.includes(description))
  • To: loggedDescriptions.includes(description) (exact match)

The duplicate detection now uses exact string matching for direct commits, consistent with PR number matching for PR-based commits.

@Qbandev Qbandev requested a review from Copilot November 7, 2025 14:35
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Qbandev Qbandev requested a review from Copilot November 7, 2025 14:40
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Trim both new and logged descriptions before comparison
- Prevents duplicates when commits have trailing/leading whitespace
- Matches behavior of changelog parser's extractPrLinks which trims descriptions

Example fix:
- Raw commit: 'Update Attributions ' (with trailing space)
- Logged: 'Update Attributions' (trimmed by parser)
- Now correctly detected as duplicate after normalization

Addresses Cursor bot feedback on whitespace mismatch
@Qbandev Qbandev requested a review from Copilot November 7, 2025 14:51
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


- Normalize descriptions once at source in getAllLoggedDescriptions()
- Eliminates N×M trim operations in filter loop
- For 100 commits × 500 logged = 50,000 operations reduced to 500

Before: trim() called on every comparison in inner loop
After: trim() called once per logged description when building array

Addresses Copilot performance feedback
@Qbandev Qbandev requested a review from Copilot November 7, 2025 14:58
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jake-perkins
jake-perkins previously approved these changes Nov 7, 2025
- Prevents duplicates from squash merges where both original and merge commits appear
- When a commit without PR number has a corresponding merge commit with same description, skip it
- The merge commit (with PR number) is used instead

Example scenario that's now fixed:
1. Feature branch commit: 'feat: Add dashboard' (no PR#)
2. Squash merge commit: 'feat: Add dashboard (#6)' (has PR#)
Both have same description but different PR status - now only merge commit is used

Fixes duplicate entry issue found in test repository:
https://github.com/consensys-test/metamask-extension-test/blob/release/101.0.0-Changelog/CHANGELOG.md
@Qbandev Qbandev force-pushed the fix/INFRA-3081-duplicate-detection branch from 8fe284f to 4e7a9cb Compare November 7, 2025 16:00
@Qbandev Qbandev requested a review from Copilot November 7, 2025 16:02
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 7, 2025

bugbot run

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


- Log all commits extracted by getCommits()
- Log filter decisions (why commits are skipped)
- Log final commits after filtering
- Shows PR numbers, logged arrays, and Set contents

This will help diagnose why valid commits are missing from changelog
in test repository (PR #10 and Emergency security patch)

Temporary debug commit - will be removed after investigation
- Change 'merge' to 'Merge pull request' in the list of keywords to indicate excluded commits
- Ensures more accurate filtering of commit messages in the changelog generation process
- Eliminated debug logging related to commit extraction and filtering to clean up the codebase
- Simplified the filtering logic by directly returning boolean expressions instead of using intermediate variables
- Maintained functionality while improving code readability and maintainability
- Added a comment to clarify the use of process.env for the GitHub token
- Maintained error handling for missing GITHUB_TOKEN to prevent runtime issues
- Updated logic to use the description instead of the subject for categorizing merge commits
- Ensures accurate representation of the content from pull requests in changelog generation
- Prevents exclusion of relevant information from merge commits with generic subjects
@Qbandev Qbandev requested a review from Copilot November 19, 2025 09:45
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 19, 2025

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


Copilot finished reviewing on behalf of Qbandev November 19, 2025 09:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gauthierpetetin gauthierpetetin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for adding these unit tests

@Qbandev Qbandev requested a review from jake-perkins November 19, 2025 14:39
@Qbandev Qbandev enabled auto-merge (squash) November 20, 2025 14:27
Copy link
Contributor

@mcmire mcmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get to this. I had some questions about the core logic here, as well as some suggestions for simplifying code and tests.

'flaky test',
'INFRA-',
'merge',
'Merge pull request',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change belong in this PR? It seems separate from the problem this PR is solving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is related to the PR's goal because the original keyword 'merge' was too broad and would exclude any commit containing "merge" in the description (e.g., "feat: merge user profiles into dashboard"). Changing to 'Merge pull request' is more specific and only excludes actual GitHub merge commit messages.

That said, I can remove it if you prefer to keep the scope narrower. Let me know your preference.

Copy link
Member

@Gudahtt Gudahtt Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to make this change separately I think. There are other entries that look too broad as well. e.g. cp- would exclude any urgent changes that we cherry-pick (which certainly we want documented for the hotfix release).

The whole strategy is a bit questionable to be honest 🤔 It's fairly common for teams to merge functional changes under PRs with a title related to non-functional changes like e2e tests. Certainly not a good practice, but we don't do enough to detect or prevent it. But I guess that's out of scope for this PR, it's already done.

Comment on lines 2 to 13
import * as runCommandModule from './run-command';

// Mock the run-command module
jest.mock('./run-command');

const mockRunCommand = runCommandModule.runCommand as jest.MockedFunction<
typeof runCommandModule.runCommand
>;
const mockRunCommandAndSplit =
runCommandModule.runCommandAndSplit as jest.MockedFunction<
typeof runCommandModule.runCommandAndSplit
>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you import both of these functions explicitly at the top, I believe you can write this more simply:

Suggested change
import * as runCommandModule from './run-command';
// Mock the run-command module
jest.mock('./run-command');
const mockRunCommand = runCommandModule.runCommand as jest.MockedFunction<
typeof runCommandModule.runCommand
>;
const mockRunCommandAndSplit =
runCommandModule.runCommandAndSplit as jest.MockedFunction<
typeof runCommandModule.runCommandAndSplit
>;
import { runCommand, runCommandAndSplit } from './run-command';
jest.mock('./run-command');
const mockRunCommand = jest.mocked(runCommand);
const mockRunCommandAndSplit = jest.mocked(runCommandAndSplit);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Switched to direct imports for consistency with the main source file and cleaner type assertions.

Note: I kept the MockedFunction type assertion because jest.mocked() was added in Jest 27.4.0, and our @types/jest (26.0.23) doesn't include the type definitions for it. Upgrading @types/jest to ≥27.4.0 could be a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, we're already on Jest v29 here, so the types must have been left outdated by mistake


describe('including commits with and without PR numbers', () => {
it('should include commits with PR numbers', async () => {
// Mock git rev-list to return commit hashes
Copy link
Contributor

@mcmire mcmire Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for another PR: At some point I wonder if we ought to extract getCommits to its own file so that we don't have to mock both commands here and we can simplify these tests slightly. getCommits is very complex anyway and probably deserves its own tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! We can create some follow-up PR from this one.

*/
function initOctoKit() {
if (!process.env.GITHUB_TOKEN) {
// eslint-disable-next-line node/no-process-env
Copy link
Contributor

@mcmire mcmire Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for another PR: It might be good if we extracted this function to its own file so we could mock it better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Follow-up item.

* @returns All change descriptions included in the given changelog, trimmed.
*/
function getAllLoggedDescriptions(changelog: Changelog) {
return getAllChanges(changelog).map((change) => change.description.trim());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note on pre-trimming commits above so that we can simplify this here:

Suggested change
return getAllChanges(changelog).map((change) => change.description.trim());
return getAllChanges(changelog).map((change) => change.description);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept .trim() as defensive measure - getAllLoggedDescriptions() reads from the parsed changelog file (which may have manual edits or entries from older tool versions), not from getCommits(). The trim ensures consistent comparison regardless of source data whitespace.

Happy to remove if you prefer consistency with the "trim at source" principle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Elliot was suggesting that we trim them earlier in the process, not stop trimming them in auto-changelog altogether. At least that was my read, maybe I'm mistaken.

Either way, this seems like something we can consider in a later PR.

- Introduced a new `Commit` type to encapsulate commit details, including PR number, subject, description, and merge commit status.
- Implemented a `deduplicateCommits` function to streamline the filtering of duplicate commits based on PR numbers and descriptions.
- Updated the `getCommits` function to handle merge commits more effectively and ensure accurate categorization in changelog generation.
- Improved overall code readability and maintainability by consolidating duplicate detection logic.
@Qbandev
Copy link
Contributor Author

Qbandev commented Nov 25, 2025

Sorry it took me so long to get to this. I had some questions about the core logic here, as well as some suggestions for simplifying code and tests.

@mcmire Thanks for the detailed review! Your suggestions have significantly improved the code quality. All feedback has been addressed. Let me know if anything needs further adjustment.

Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A couple of minor suggestions (moving the constant to a separate PR, and the one I just left), but no blockers


describe('PR-tagged commits', () => {
it('should include commits with PR numbers', async () => {
mockRunCommandAndSplit.mockResolvedValueOnce(['commit1', 'commit2']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's hard to tell what these mocks are intended to do, because both runCommandAndSplit and runCommand are used for different commands in the module under test.

We could improve this by using a mock implementation that validates expected params maybe?

@Qbandev Qbandev merged commit c267d3a into main Nov 25, 2025
25 checks passed
@Qbandev Qbandev deleted the fix/INFRA-3081-duplicate-detection branch November 25, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants