Skip to content

Decode HTML entities before @mention detection to prevent bypass#14846

Merged
pelikhan merged 3 commits intomainfrom
copilot/fix-html-entity-decoding
Feb 10, 2026
Merged

Decode HTML entities before @mention detection to prevent bypass#14846
pelikhan merged 3 commits intomainfrom
copilot/fix-html-entity-decoding

Conversation

Copy link
Contributor

Copilot AI commented Feb 10, 2026

HTML-encoded @ symbols (@, @, @) were not decoded before @mention regex matching, allowing potential bypass if GitHub's renderer decodes entities post-sanitization.

Changes

  • Added decodeHtmlEntities() function to handle named, decimal, hex, and double-encoded entities with Unicode validation (0-0x10FFFF)
  • Positioned entity decoding as step 2 in hardenUnicodeText() pipeline, after NFC normalization and before full-width ASCII conversion
  • Exported function for testing and potential reuse

Implementation

function decodeHtmlEntities(text) {
  let result = text;
  
  // Named entities: @, @ → @
  result = result.replace(/&(?:amp;)?commat;/gi, "@");
  
  // Decimal entities: @, @ → @
  result = result.replace(/&(?:amp;)?#(\d+);/g, (match, code) => {
    const codePoint = parseInt(code, 10);
    return (codePoint >= 0 && codePoint <= 0x10ffff) 
      ? String.fromCodePoint(codePoint) 
      : match;
  });
  
  // Hex entities: &#x40;, &amp;#x40; → @
  result = result.replace(/&(?:amp;)?#[xX]([0-9a-fA-F]+);/g, (match, code) => {
    const codePoint = parseInt(code, 16);
    return (codePoint >= 0 && codePoint <= 0x10ffff)
      ? String.fromCodePoint(codePoint)
      : match;
  });
  
  return result;
}

Entity decoding now occurs before @mention regex evaluation, ensuring encoded @ symbols are properly neutralized:

// Before: &commat;user → "&commat;user" (bypass)
// After:  &commat;user → "`@user`" (neutralized)

Files modified:

  • actions/setup/js/sanitize_content_core.cjs: Added decoding function and integrated into Unicode hardening
  • actions/setup/js/sanitize_content.test.cjs: Added 27 test cases covering entity formats and edge cases

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Added decodeHtmlEntities() function to decode HTML entities early in sanitization
- Handles named entities (&commat;), decimal entities (&#64;), and hex entities (&#x40;)
- Supports double-encoded variants (&amp;commat;, &amp;#64;, &amp;#X40;)
- Decoding happens in hardenUnicodeText() as step 2 (before full-width conversion)
- Added comprehensive test coverage (27 new tests) for entity decoding scenarios
- All tests passing (210 tests total)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review February 10, 2026 22:55
Copilot AI review requested due to automatic review settings February 10, 2026 22:55
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix HTML entity decoding in @mention detection Decode HTML entities before @mention detection to prevent bypass Feb 10, 2026
Copilot AI requested a review from pelikhan February 10, 2026 22:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens the JavaScript sanitization pipeline to prevent bypassing @mention neutralization via HTML entity-encoded @ characters by decoding relevant HTML entities early in hardenUnicodeText().

Changes:

  • Added decodeHtmlEntities() and integrated it into hardenUnicodeText() before mention detection runs.
  • Implemented decoding for &commat; and numeric entities (decimal + hex), including &amp; double-encoded variants.
  • Added comprehensive tests covering encoded-mention scenarios and general numeric entity decoding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
actions/setup/js/sanitize_content_core.cjs Adds HTML entity decoding helper and runs it early in Unicode hardening to prevent mention-bypass via encoded @.
actions/setup/js/sanitize_content.test.cjs Adds tests validating that entity-encoded mentions are decoded and then neutralized, plus numeric-entity decoding cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pelikhan pelikhan merged commit 4e9c06e into main Feb 10, 2026
152 checks passed
@pelikhan pelikhan deleted the copilot/fix-html-entity-decoding branch February 10, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants