Sanitize Tags Unicode Block on Message creation by amed-xyz · Pull Request #3920 · block/goose

amed-xyz · 2025-08-07T16:04:18Z

What

Mitigate Unicode-based prompt injection attacks where attackers embed invisible Unicode characters to smuggle hidden commands, past user inspection and into LLM processing.

The specific threat this is addressing:

Complete ASCII mirror: Invisible versions of all ASCII characters (a-z, A-Z, 0-9, symbols)
Completely invisible: Designed to be unrendered by tag-unaware implementations
LLM readable: Training data included these characters, tokenizers process them

How

Sanitize text input at the single point where all text messages enter the system (CLI, Desktop, etc) i.e. the Message::with_text() function.

DOsinga

nice. you know what, though, I would just drop the sanitize_needed flag for now. we can talk to @spencrmartin about what we should do in the UI when this happens and then calculate backwards from there in a follow up PR. what do you think?

michaelneale

nice one, good catch

* 'main' of github.com:block/goose: remove fallback routing to hub/home for unknown routes (#3954) Use cross in linux bundle workflow (#3950) fix: disable signing for release branches until we figure out keys for this flow (#3951) Sanitize Tags Unicode Block (#3920) Add a message about DCO to CONTRIBUTING.md (#3741) Move hardcoded LLM prompts to template files (#3934) docs: migrate streamable config to consolidated component (#3936) feat: streamline list args on cli (#3937) mcp/developer: Refactor to use tokio SplitStream (#3894) feat: first time automated ollama install experience and openrouter (#3881) chore: rmcp 0.5.0 (#3935) add gpt-5 to openai provider format (#3924) added gpt5 context limit (#3927) show status of osx codesigning and increase timeout (#3926) Bump auto-compact threshold to 80% (#3925) FIX: gemini tool call hanging (#3898) feat(deps): upgrade rmcp to 0.4.1 (#3918) Fix dark mode rendering of config form and centered providers grid for wider screens. (#3837) fix: extension list not refreshing after installing from deeplink (#3878)

* 'main' of github.com:block/goose: remove fallback routing to hub/home for unknown routes (#3954) Use cross in linux bundle workflow (#3950) fix: disable signing for release branches until we figure out keys for this flow (#3951) Sanitize Tags Unicode Block (#3920) Add a message about DCO to CONTRIBUTING.md (#3741) Move hardcoded LLM prompts to template files (#3934) docs: migrate streamable config to consolidated component (#3936) feat: streamline list args on cli (#3937) mcp/developer: Refactor to use tokio SplitStream (#3894) feat: first time automated ollama install experience and openrouter (#3881) chore: rmcp 0.5.0 (#3935) add gpt-5 to openai provider format (#3924) added gpt5 context limit (#3927) show status of osx codesigning and increase timeout (#3926) Bump auto-compact threshold to 80% (#3925)

Signed-off-by: Jack Wright <jack.wright@nike.com>

amed-xyz self-assigned this Aug 7, 2025

DOsinga self-requested a review August 7, 2025 16:36

DOsinga approved these changes Aug 7, 2025

View reviewed changes

amed-xyz marked this pull request as ready for review August 7, 2025 17:20

amed-xyz added 3 commits August 7, 2025 10:25

sanitize unicode tags on message creation

d7ca32e

sanitize unicode tags on message creation

99afbb9

remove sanitize_needed flag

5629291

amed-xyz force-pushed the amed/unicode-tags-sanitization branch from 8b1f65b to 5629291 Compare August 7, 2025 17:25

formatting

14dea97

michaelneale approved these changes Aug 8, 2025

View reviewed changes

amed-xyz merged commit 48c9af0 into main Aug 8, 2025
11 checks passed

amed-xyz deleted the amed/unicode-tags-sanitization branch August 8, 2025 16:06

This was referenced Aug 8, 2025

Sanitize Tags Unicode Block on Message deserialization #3966

Merged

Sanitize Tags Unicode Block at prompt level #4047

Merged

amed-xyz changed the title ~~Sanitize Tags Unicode Block~~ Sanitize Tags Unicode Block on Message creation Aug 12, 2025

alexhancock mentioned this pull request Aug 13, 2025

chore(release): release version 1.4.0 #4069

Merged

amed-xyz mentioned this pull request Aug 15, 2025

Desktop alerts when suspicious unicode characters found in Recipe #4080

Merged

ayax79 pushed a commit to ayax79/goose that referenced this pull request Aug 21, 2025

Sanitize Tags Unicode Block (block#3920)

115e483

Signed-off-by: Jack Wright <jack.wright@nike.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize Tags Unicode Block on Message creation#3920

Sanitize Tags Unicode Block on Message creation#3920
amed-xyz merged 4 commits intomainfrom
amed/unicode-tags-sanitization

amed-xyz commented Aug 7, 2025 •

edited

Loading

Uh oh!

DOsinga left a comment

Uh oh!

michaelneale left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amed-xyz commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

michaelneale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amed-xyz commented Aug 7, 2025 •

edited

Loading