UTF-8 Decode Error in Claude SDK when Processing CLI stdout

# UTF-8 Decode Error in Claude SDK when Processing CLI stdout

## Summary

The Claude Agent SDK for Python throws a `UnicodeDecodeError` when processing stdout from the Claude CLI, due to **concurrent writes** in the Node.js process causing byte stream reordering.

## Error Message

```
'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte
```

## Root Cause

The Claude CLI (Node.js) has **multiple concurrent output sources** writing to stdout:

1. **Main output** (file processing, progress) - outputs Chinese filenames and content
2. **HTTP request logger** - outputs logs like `[log_b914a8] sending request`

These sources **interleave** their output, causing UTF-8 multi-byte characters to be split across different write operations.

### Evidence from Debug Logs

Analyzing the byte stream received from CLI stdout:

**Block 132 end (6034 bytes):**
```
...
1780: 50 57 53 74 61 74 69 73 74 69 63 73 20 28 e6 96
1790: 87 e4
```
- `e6 96 87` = "文" (complete)
- `e4` = "件" (first byte, **incomplete**)
- Missing bytes: `bb b6`

**Block 133 (5125 bytes) - HTTP log interleaved:**
```
0000: 5b 6c 6f 67 5f 62 39 31 34 61 38 5d 20 73 65 6e
       [log_b914a8] sen
```
- This is an HTTP request log that **interrupted** the main output

**Block 134 (8192 bytes) - delayed bytes appear:**
```
0000: bb b6 3a 20 43 6f 6d 77 61 72 65 20 4c 32 56 50
      »¶: Comware L2VP
```
- `bb b6` = remaining bytes of "件" (delayed!)
- These bytes finally appear **after** the HTTP log

### Timeline of What Happened

```
T1: Main output writes "文件" → bytes: E6 96 87 E4
    (gets suspended, waiting for async operation)

T2: HTTP request starts → logger outputs: [log_b914a8] sending request
    (this is Block 133, 5125 bytes)

T3: Main output resumes → writes remaining bytes: BB B6 3A ...
    (this is Block 134, 8192 bytes)
```

## Why the SDK Fails

### Code Analysis

**Layer 1: OS Kernel**
- stdout pipe buffer (8KB)
- Reads by byte count, no character boundary awareness

**Layer 2: `TextReceiveStream` (anyio/streams/text.py)**
```python
class TextReceiveStream:
    def __post_init__(self, encoding: str, errors: str) -> None:
        decoder_class = codecs.getincrementaldecoder(encoding)
        self._decoder = decoder_class(errors=errors)  # ← errors='strict' by default

    async def receive(self) -> str:
        chunk = await self.transport_stream.receive()
        decoded = self._decoder.decode(chunk)  # ← Fails here on 0xBB
        return decoded
```

When the decoder encounters byte `0xBB` (which is not a valid UTF-8 start byte and doesn't follow the expected `0xE4` from previous chunk), it throws `UnicodeDecodeError` with `errors='strict'`.

**Layer 3: `_read_messages_impl` (subprocess_cli.py:703)**
```python
async def _read_messages_impl(self) -> AsyncIterator[dict[str, Any]]:
    json_buffer = ""

    try:
        async for line in self._stdout_stream:  # ← Fails here!
            # JSON buffering logic that never gets a chance to run
            json_buffer += json_line
            data = json.loads(json_buffer)
            yield data
    except UnicodeDecodeError:
        # Error propagates from Layer 2
```

### Why JSON Buffering Doesn't Help

The `json_buffer` mechanism (line 708-748) is designed to handle **incomplete JSON messages at the JSON layer**, but the UTF-8 decode error happens **at the byte stream → text conversion layer**, which is **before** JSON processing.

```
Error propagation:
IncrementalDecoder.decode() → UnicodeDecodeError ❌
    ↓
TextReceiveStream.receive() → Exception propagates
    ↓
async for line in self._stdout_stream → Iterator fails
    ↓
json_buffer += json_line → Never executes ✗
```

## Impact

- **Severity**: High - breaks entire sessions
- **Frequency**: Intermittent - depends on async operation timing
- **Scope**: Affects any usage where CLI outputs non-ASCII text (Chinese, etc.)

## Possible Solutions

### Option 1: Use Tolerant Decoding (Recommended)

Change `TextReceiveStream` initialization to use `errors='replace'` or `errors='ignore'`:

**File**: `claude_agent_sdk/_internal/transport/subprocess_cli.py`
**Line**: 555

```python
# Current:
self._stdout_stream = TextReceiveStream(self._process.stdout)

# Proposed:
self._stdout_stream = TextReceiveStream(
    self._process.stdout,
    encoding='utf-8',
    errors='replace'  # or 'surrogateescape'
)
```

**Pros**:
- Simple fix
- Prevents crashes from byte reordering
- `surrogateescape` preserves invalid bytes for potential recovery

**Cons**:
- May mask actual encoding issues
- Replaced characters (�) in output

### Option 2: Fix CLI Concurrent Writes

Ensure the CLI process outputs to stdout atomically, using proper synchronization.

**Pros**:
- Fixes root cause
- No data corruption

**Cons**:
- Requires changes to CLI (Node.js)
- May impact performance

### Option 3: Add Byte Stream Recovery

Implement a recovery mechanism in the SDK to detect and handle out-of-order UTF-8 byte sequences.

**Pros**:
- Handles edge cases
- No CLI changes needed

**Cons**:
- Complex implementation
- May not catch all cases

## Environment

- **SDK Version**: `claude-agent-sdk 0.1.18`
- **CLI Version**: `claude-code 2.0.72`
- **Python**: 3.13
- **OS**: Linux (Ubuntu)
- **Node.js**: v24.3.0

## Reproduction

The issue is intermittent but occurs when:

1. CLI outputs Chinese characters (or any multi-byte UTF-8)
2. An async operation (HTTP request, file read, etc.) triggers logging
3. The logger output interleaves with the main output
4. UTF-8 bytes get reordered

## Expected Behavior

The SDK should handle byte stream reordering gracefully without crashing, especially when the root cause is concurrent writes in the CLI process itself.

## Actual Behavior

The SDK crashes with `UnicodeDecodeError`, terminating the entire session.

## Additional Context

This is **not** a traditional 8KB buffer truncation issue. The byte stream reordering happens **inside** the CLI process due to Node.js's async/concurrent nature, before data even reaches the SDK's receive buffer.

The issue is particularly problematic because:
- It's intermittent (depends on timing)
- It affects legitimate use cases (Chinese file paths, content)
- The error message doesn't indicate the real cause (CLI concurrent writes)

## References

- Related to anyio's `TextReceiveStream` using `errors='strict'` by default
- Similar to issues with multi-byte character handling in stream processing


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 Decode Error in Claude SDK when Processing CLI stdout #549

UTF-8 Decode Error in Claude SDK when Processing CLI stdout

Summary

Error Message

Root Cause

Evidence from Debug Logs

Timeline of What Happened

Why the SDK Fails

Code Analysis

Why JSON Buffering Doesn't Help

Impact

Possible Solutions

Option 1: Use Tolerant Decoding (Recommended)

Option 2: Fix CLI Concurrent Writes

Option 3: Add Byte Stream Recovery

Environment

Reproduction

Expected Behavior

Actual Behavior

Additional Context

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF-8 Decode Error in Claude SDK when Processing CLI stdout #549

Description

UTF-8 Decode Error in Claude SDK when Processing CLI stdout

Summary

Error Message

Root Cause

Evidence from Debug Logs

Timeline of What Happened

Why the SDK Fails

Code Analysis

Why JSON Buffering Doesn't Help

Impact

Possible Solutions

Option 1: Use Tolerant Decoding (Recommended)

Option 2: Fix CLI Concurrent Writes

Option 3: Add Byte Stream Recovery

Environment

Reproduction

Expected Behavior

Actual Behavior

Additional Context

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions