perf(ui): optimize stripUnsafeCharacters with regex#18413
Conversation
Replace the array-based toCodePoints().filter().join() pattern with a single regex replace for significantly better performance. Before: O(n) with multiple array allocations - toCodePoints() creates array from string - filter() creates new filtered array - join() reconstructs string After: O(n) with minimal allocations - Single regex pass with in-place replacement The regex matches: - C0 control chars (0x00-0x1F) except TAB(0x09), LF(0x0A), CR(0x0D) - C1 control chars (0x80-0x9F) Add comprehensive unit tests covering: - All preserved characters (TAB, LF, CR, DEL, printable ASCII, Unicode) - All stripped C0 control characters (NULL, BELL, BS, etc.) - C1 control character range - ANSI escape sequence stripping - Edge cases (empty string, long strings, emoji, surrogate pairs)
|
Hi there! Thank you for your contribution to Gemini CLI. To improve our contribution process and better track changes, we now require all pull requests to be associated with an existing issue, as announced in our recent discussion and as detailed in our CONTRIBUTING.md. This pull request is being closed because it is not currently linked to an issue. Once you have updated the description of this PR to link an issue (e.g., by adding How to link an issue: Thank you for your understanding and for being a part of our community! |
|
Hi @gsquared94, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello @gsquared94, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a crucial performance enhancement to the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly optimizes the stripUnsafeCharacters function by replacing an array-based filtering approach with a more efficient regex-based method. The detailed benchmark results clearly demonstrate a substantial average speedup (12x) and reduced memory pressure, which is crucial for a function called with high frequency in interactive UI components. The extensive new unit tests provide excellent coverage, verifying the behavioral equivalence and correctness across various character types, including Unicode and edge cases. This is a well-executed performance improvement that enhances both responsiveness and resource usage.
|
Size Change: -258 B (0%) Total Size: 23.7 MB ℹ️ View Unchanged
|

Performance Optimization:
stripUnsafeCharactersSummary
This PR replaces the array-based implementation of
stripUnsafeCharacterswith a regex-based approach, achieving an average 12x speedup across typical workloads while maintaining identical behavior.The Change
Benchmark Results
Average speedup: 12.14x
Why This Matters
1. High Call Frequency
stripUnsafeCharactersis called on:Even microsecond improvements compound significantly during interactive sessions.
2. Memory Pressure Reduction
Old implementation (per call):
Array.from(str)→ Allocates N array elements.filter()→ Allocates new array (up to N elements).join('')→ Creates final stringNew implementation (per call):
.replace()→ Creates new string (single V8-optimized pass)Eliminating array allocations reduces garbage collection pressure, improving UI responsiveness.
3. Scales With Input Size
The speedup increases with string length:
This is critical for large terminal output, log files, and paste operations.
4. Unicode Performance
19x improvement for Unicode-heavy text because
Array.from()has significant overhead for multi-byte characters (emoji, CJK, etc.).Correctness Verification
The new implementation produces identical output for all test cases:
68 unit tests added covering all character classes and edge cases.
Benchmark Script
Click to expand benchmark code
Risk Assessment
Low risk: