perf(ui): optimize stripUnsafeCharacters with regex by gsquared94 · Pull Request #18413 · google-gemini/gemini-cli

gsquared94 · 2026-02-06T01:34:01Z

Performance Optimization: `stripUnsafeCharacters`

Summary

This PR replaces the array-based implementation of stripUnsafeCharacters with a regex-based approach, achieving an average 12x speedup across typical workloads while maintaining identical behavior.

The Change

- return toCodePoints(strippedVT)
-   .filter((char) => {
-     const code = char.codePointAt(0);
-     if (code === undefined) return false;
-     if (code === 0x0a || code === 0x0d || code === 0x09) return true;
-     if (code >= 0x00 && code <= 0x1f) return false;
-     if (code >= 0x80 && code <= 0x9f) return false;
-     return true;
-   })
-   .join('');
+ return strippedVT.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/g, '');

Benchmark Results

Test Case	String Length	Old (ms)	New (ms)	Speedup
Short (user input)	30	0.0040	0.0005	8.9x
Medium (terminal output)	220	0.0125	0.0014	8.7x
Long (file/logs)	4,100	0.2492	0.0304	8.2x
Very Long (stress test)	20,010	1.6070	0.0730	22x
Unicode/Emoji heavy	1,100	0.0750	0.0039	19.4x
Control-char heavy	1,600	0.0660	0.0609	1.1x
Clean string (no changes)	1,360	0.0912	0.0054	16.8x

Average speedup: 12.14x

Why This Matters

1. High Call Frequency

stripUnsafeCharacters is called on:

Every user keystroke in the text input buffer
Terminal output processing
Session recording and replay
Paste operations

Even microsecond improvements compound significantly during interactive sessions.

2. Memory Pressure Reduction

Old implementation (per call):

Array.from(str) → Allocates N array elements
.filter() → Allocates new array (up to N elements)
.join('') → Creates final string
Total: 3 string + 2 array allocations

New implementation (per call):

.replace() → Creates new string (single V8-optimized pass)
Total: 3 string + 0 array allocations

Eliminating array allocations reduces garbage collection pressure, improving UI responsiveness.

3. Scales With Input Size

The speedup increases with string length:

30 chars: 8.9x
20,000 chars: 22x

This is critical for large terminal output, log files, and paste operations.

4. Unicode Performance

19x improvement for Unicode-heavy text because Array.from() has significant overhead for multi-byte characters (emoji, CJK, etc.).

Correctness Verification

The new implementation produces identical output for all test cases:

✓ Preserves TAB (0x09), LF (0x0A), CR (0x0D)
✓ Preserves DEL (0x7F)
✓ Preserves all printable ASCII and Unicode
✓ Strips C0 control chars (0x00-0x1F except TAB/LF/CR)
✓ Strips C1 control chars (0x80-0x9F)
✓ Handles emoji, ZWJ sequences, surrogate pairs correctly

68 unit tests added covering all character classes and edge cases.

Benchmark Script

Click to expand benchmark code

import stripAnsi from "strip-ansi";
import { stripVTControlCharacters } from "node:util";

// Old implementation
function toCodePoints(str: string): string[] {
  return Array.from(str);
}

function stripUnsafeCharactersOld(str: string): string {
  const strippedAnsi = stripAnsi(str);
  const strippedVT = stripVTControlCharacters(strippedAnsi);
  return toCodePoints(strippedVT)
    .filter((char) => {
      const code = char.codePointAt(0);
      if (code === undefined) return false;
      if (code === 0x0a || code === 0x0d || code === 0x09) return true;
      if (code >= 0x00 && code <= 0x1f) return false;
      if (code >= 0x80 && code <= 0x9f) return false;
      return true;
    })
    .join("");
}

// New implementation
function stripUnsafeCharactersNew(str: string): string {
  const strippedAnsi = stripAnsi(str);
  const strippedVT = stripVTControlCharacters(strippedAnsi);
  return strippedVT.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/g, "");
}

// Test data
const testData = {
  short: "Hello, World!\tThis is a test.\n",
  medium:
    "\x1b[32mSuccess:\x1b[0m " +
    "x".repeat(100) +
    "\x07" +
    "y".repeat(100) +
    "\n",
  long: "Normal text with some \x00 control \x07 chars. ".repeat(100),
  veryLong: ("a".repeat(1000) + "\x00" + "b".repeat(1000)).repeat(10),
  unicode: "🎉 Hello 世界! κόσμε 🚀 ".repeat(50),
  controlHeavy: "a\x00b\x01c\x02d\x03e\x04f\x05g\x06h\x07".repeat(100),
  clean: "This is a completely clean string with no control characters.".repeat(
    20,
  ),
};

// Benchmark function
function benchmark(fn: () => void, iterations: number): number {
  for (let i = 0; i < 100; i++) fn(); // Warmup
  const start = performance.now();
  for (let i = 0; i < iterations; i++) fn();
  return (performance.now() - start) / iterations;
}

// Run benchmarks
for (const [name, input] of Object.entries(testData)) {
  const oldTime = benchmark(() => stripUnsafeCharactersOld(input), 10000);
  const newTime = benchmark(() => stripUnsafeCharactersNew(input), 10000);
  console.log(`${name}: ${(oldTime / newTime).toFixed(2)}x speedup`);
}

Risk Assessment

Low risk:

Single regex pattern compiled once (V8 caches compiled regexes)
Behavioral equivalence verified with 68 tests
No API changes - drop-in replacement
Regex pattern is simple and well-tested character class matching

Replace the array-based toCodePoints().filter().join() pattern with a single regex replace for significantly better performance. Before: O(n) with multiple array allocations - toCodePoints() creates array from string - filter() creates new filtered array - join() reconstructs string After: O(n) with minimal allocations - Single regex pass with in-place replacement The regex matches: - C0 control chars (0x00-0x1F) except TAB(0x09), LF(0x0A), CR(0x0D) - C1 control chars (0x80-0x9F) Add comprehensive unit tests covering: - All preserved characters (TAB, LF, CR, DEL, printable ASCII, Unicode) - All stripped C0 control characters (NULL, BELL, BS, etc.) - C1 control character range - ANSI escape sequence stripping - Edge cases (empty string, long strings, emoji, surrogate pairs)

gemini-cli · 2026-02-06T01:34:12Z

Hi there! Thank you for your contribution to Gemini CLI.

To improve our contribution process and better track changes, we now require all pull requests to be associated with an existing issue, as announced in our recent discussion and as detailed in our CONTRIBUTING.md.

This pull request is being closed because it is not currently linked to an issue. Once you have updated the description of this PR to link an issue (e.g., by adding Fixes #123 or Related to #123), it will be automatically reopened.

How to link an issue:
Add a keyword followed by the issue number (e.g., Fixes #123) in the description of your pull request. For more details on supported keywords and how linking works, please refer to the GitHub Documentation on linking pull requests to issues.

Thank you for your understanding and for being a part of our community!

gemini-cli · 2026-02-06T01:34:14Z

Hi @gsquared94, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this.

We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines.

Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed.

Thank you for your understanding and for being a part of our community!

gemini-code-assist · 2026-02-06T01:34:17Z

Summary of Changes

Hello @gsquared94, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial performance enhancement to the stripUnsafeCharacters utility function, which is frequently invoked in UI operations. By transitioning from an iterative, array-based filtering method to a highly optimized regular expression, the change dramatically improves execution speed and reduces memory footprint. This optimization directly contributes to a more responsive and efficient user interface, particularly during high-frequency interactions and when processing large volumes of text.

Highlights

Performance Optimization: Replaced an array-based character stripping logic with a regex-based approach in stripUnsafeCharacters, achieving an average 12x speedup across various workloads.
Memory Efficiency: Reduced memory allocations from multiple string and array allocations to a single string allocation per call, significantly lowering garbage collection pressure and improving UI responsiveness.
Scalability and Unicode Support: The performance improvement scales with input size (up to 22x for very long strings) and shows a 19x improvement for Unicode-heavy text, addressing overheads in the old implementation.
Enhanced Test Coverage: Added 68 new unit tests to rigorously verify the correctness and behavioral equivalence of the new stripUnsafeCharacters implementation across all character classes and edge cases.

Changelog

packages/cli/src/ui/utils/textUtils.test.ts
- Expanded unit tests for stripUnsafeCharacters to cover a comprehensive range of scenarios, including preserved characters (TAB, LF, CR, DEL, printable ASCII, Unicode, emojis, ZWJ sequences), stripped C0 control characters (0x00-0x1F except TAB/LF/CR), stripped C1 control characters (0x80-0x9F), ANSI escape sequence stripping, multiple control characters, and various edge cases.
- The test suite now includes specific tests for NULL, SOH, STX, ETX, EOT, ENQ, ACK, BELL, BACKSPACE, VT, FF, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, CAN, EM, SUB, FS, GS, RS, US control characters.
packages/cli/src/ui/utils/textUtils.ts
- Updated the JSDoc comment for stripUnsafeCharacters to accurately reflect that TAB (0x09), LF (0x0A), and CR (0x0D) are preserved among C0 control characters.
- Refactored the stripUnsafeCharacters function to replace the toCodePoints().filter().join('') logic with a single String.prototype.replace() call using a regular expression /[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/g for efficient stripping of unsafe control characters.

Activity

The author, gsquared94, initiated this pull request with a clear focus on performance optimization for the stripUnsafeCharacters function.
Detailed benchmark results were provided, showcasing significant speedups (average 12x) across various test cases, including short, medium, long, very long, Unicode-heavy, control-char heavy, and clean strings.
The PR includes a thorough explanation of 'Why This Matters', covering high call frequency, memory pressure reduction, scaling with input size, and Unicode performance.
Correctness verification was emphasized, with the new implementation producing identical output for all test cases and being backed by 68 new unit tests.
A risk assessment was performed, concluding that the change is low risk due to the use of a single, V8-optimized regex pattern, verified behavioral equivalence, and no API changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly optimizes the stripUnsafeCharacters function by replacing an array-based filtering approach with a more efficient regex-based method. The detailed benchmark results clearly demonstrate a substantial average speedup (12x) and reduced memory pressure, which is crucial for a function called with high frequency in interactive UI components. The extensive new unit tests provide excellent coverage, verifying the behavioral equivalence and correctness across various character types, including Unicode and edge cases. This is a well-executed performance improvement that enhances both responsiveness and resource usage.

curl95404

420

jacob314

github-actions · 2026-02-06T01:40:49Z

Size Change: -258 B (0%)

Total Size: 23.7 MB

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/gemini.js`	23.7 MB	-258 B (0%)
`./bundle/sandbox-macos-permissive-closed.sb`	1.03 kB	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-closed.sb`	3.29 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B

_{compressed-size-action}

)

gsquared94 requested a review from a team as a code owner February 6, 2026 01:34

gemini-cli bot closed this Feb 6, 2026

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

gsquared94 reopened this Feb 6, 2026

curl95404 reviewed Feb 6, 2026

View reviewed changes

jacob314 approved these changes Feb 6, 2026

View reviewed changes

gsquared94 enabled auto-merge February 6, 2026 01:41

gsquared94 added this pull request to the merge queue Feb 6, 2026

Merged via the queue into main with commit 289769f Feb 6, 2026
33 of 50 checks passed

gsquared94 deleted the perf/optimize-strip-unsafe-characters branch February 6, 2026 01:55

aswinashok44 pushed a commit to aswinashok44/gemini-cli that referenced this pull request Feb 9, 2026

perf(ui): optimize stripUnsafeCharacters with regex (google-gemini#18413

96f3009

)

This was referenced Feb 18, 2026

Changelog for v0.29.0 #19361

Merged

Changelog for v0.30.0-preview.5 #20107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf(ui): optimize stripUnsafeCharacters with regex#18413

perf(ui): optimize stripUnsafeCharacters with regex#18413
gsquared94 merged 1 commit intomainfrom
perf/optimize-strip-unsafe-characters

gsquared94 commented Feb 6, 2026

Uh oh!

gemini-cli bot commented Feb 6, 2026

Uh oh!

gemini-cli bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

curl95404 left a comment

Uh oh!

jacob314 left a comment

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

gsquared94 commented Feb 6, 2026

Performance Optimization: stripUnsafeCharacters

Summary

The Change

Benchmark Results

Why This Matters

1. High Call Frequency

2. Memory Pressure Reduction

3. Scales With Input Size

4. Unicode Performance

Correctness Verification

Benchmark Script

Risk Assessment

Uh oh!

gemini-cli bot commented Feb 6, 2026

Uh oh!

gemini-cli bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

curl95404 left a comment

Choose a reason for hiding this comment

Uh oh!

jacob314 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Performance Optimization: `stripUnsafeCharacters`