Skip to content

Incorrect use of String.fromCharCode() instead of String.fromCodePoint() in createNormalizedUrl function #2905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
romainmnr opened this issue Mar 18, 2025 · 3 comments

Comments

@romainmnr
Copy link
Contributor

romainmnr commented Mar 18, 2025

Search terms

#router #createNormalizedUrl #utf8 #emoji

I’ve identified an issue in TypeDoc’s source code related to the createNormalizedUrl function, which is responsible for normalizing the output file name (URL). The problem occurs when handling UTF-8 characters during the normalization process.

Expected Behavior

  • The createNormalizedUrl function should correctly handle all UTF-8 characters, including those that require more than one 16-bit code unit (like emojis and other non-BMP characters).
  • The URL should be properly normalized without breaking or displaying incorrect characters.
  • When a UTF-8 character is processed, it should be correctly encoded in the final output file name.

Image

Actual Behavior

In the createNormalizedUrl function, TypeDoc uses String.fromCharCode() to rebuild the URL string after excluding unsupported characters. However, this method is inappropriate for handling UTF-8 characters that are outside the Basic Multilingual Plane (BMP), such as emoji or certain rare Unicode characters.

Image

Why This Is a Problem:

  • Incorrect Rebuilding of Characters: Using String.fromCharCode() can result in broken or incorrect characters when processing characters with code points greater than 0xFFFF (e.g., many emoji).
  • Incorrect URL Normalization: This leads to improperly normalized output file names, potentially creating invalid or inconsistent file names for URLs that contain non-BMP characters.

Suggested Fix:

Replace String.fromCharCode() with String.fromCodePoint(). String.fromCodePoint() is designed to correctly handle all Unicode characters, including those that require more than one 16-bit code unit.

Steps to reproduce the bug

  • Test with a UTF-8 character that requires more than one 16-bit code unit (e.g., an emoji) and observe the incorrect URL normalization.
  • Add a frontmatter in an External document:
---
title: 🔧 🧑‍💻 Foo bar
---

Environment

  • Typedoc version: ^0.27.9
  • TypeScript version: 5.8.2
  • Node.js version: 20.18.3
  • OS: macOS 15.3.1
@romainmnr
Copy link
Contributor Author

PR: #2906

@Gerrit0 Gerrit0 closed this as completed Mar 19, 2025
@romainmnr
Copy link
Contributor Author

Thanks @Gerrit0 ! Do you have a schedule about the official release ?

@Gerrit0
Copy link
Collaborator

Gerrit0 commented Mar 19, 2025

I want to get #2908 in, it'll be this weekend at the latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants