Skip to content

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

encode non-ASCII headers for kb uploads

Type of Change

  • Bug fix

Testing

Tested manually.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Oct 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
sim Ready Ready Preview Comment Oct 10, 2025 11:39pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
docs Skipped Skipped Oct 10, 2025 11:39pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR fixes knowledge base upload issues with non-ASCII filenames by implementing RFC 5987 encoding in the Content-Disposition header. The core fix adds encodeFilenameForHeader() which properly encodes international characters using UTF-8 encoding with a fallback to ASCII-safe names.

Key changes:

  • Added RFC 5987 encoding for Content-Disposition headers to support non-ASCII filenames (e.g., Chinese, Japanese, Arabic characters)
  • Updated file serve endpoint to use hybrid authentication (session/API key/internal JWT) instead of session-only
  • Added internal JWT authentication for document processor to serve API communication
  • Increased KB processing timeouts from 300s to 600s and embedding batch size from 50 to 500 for better performance
  • Fixed OCR file handling to exclude internal serve paths from external HTTPS detection

Confidence Score: 4/5

  • This PR is generally safe to merge with proper testing of the authentication changes
  • The core filename encoding fix is solid and follows RFC 5987 standards. However, the authentication changes from session-only to hybrid auth in the serve endpoint represent a significant security model change that needs verification. The internal JWT implementation appears sound, but the broader implications of allowing internal JWT access to all files (not just KB files) should be validated. The timeout and batch size increases are reasonable performance improvements.
  • Pay close attention to apps/sim/app/api/files/serve/[...path]/route.ts - verify that hybrid auth correctly restricts access and that internal JWT authentication is properly scoped

Important Files Changed

File Analysis

Filename Score Overview
apps/sim/app/api/files/utils.ts 5/5 Added encodeFilenameForHeader function to properly encode non-ASCII characters in Content-Disposition headers using RFC 5987 encoding, fixing KB upload issues with international characters
apps/sim/app/api/files/serve/[...path]/route.ts 4/5 Switched from session-only auth to hybrid auth (session/API key/internal JWT), removed execution file logging - changes improve auth flexibility but may have broader implications
apps/sim/lib/knowledge/documents/document-processor.ts 5/5 Added internal token authentication for internal file serve requests, fixed OCR detection to exclude internal serve paths from external HTTPS check

Sequence Diagram

sequenceDiagram
    participant C as Client
    participant U as Upload Route
    participant S as Storage
    participant F as Serve Route
    participant P as Doc Processor
    participant A as Auth

    C->>U: Upload file
    U->>S: Store file
    S-->>U: File path
    U-->>C: Success

    P->>P: Start processing
    P->>F: Fetch file
    F->>A: Verify
    A-->>F: OK
    F->>S: Get file
    S-->>F: Buffer
    F->>F: Encode header
    F-->>P: File data
    P->>P: Process
Loading

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@vercel vercel bot temporarily deployed to Preview – docs October 10, 2025 23:36 Inactive
@waleedlatif1 waleedlatif1 merged commit 09186bd into staging Oct 10, 2025
3 of 5 checks passed
Copy link
Collaborator Author

waleedlatif1 commented Oct 10, 2025

1 Job Failed:

CI / Test and Build / Test and Build failed on "Run tests with coverage"
[...]
Error: Error: Cannot find package '@react-email/render' imported from '/home/runner/_work/sim/sim/apps/sim/app/api/workspaces/invitations/[invitationId]/route.ts'

Error: Error: Cannot find package 'zustand' imported from '/home/runner/_work/sim/sim/apps/sim/stores/workflow-diff/store.ts'
 ❯ stores/workflow-diff/store.ts:1:1

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { code: 'ERR_MODULE_NOT_FOUND' }
Caused by: Caused by: Error: Failed to load url zustand (resolved id: zustand) in /home/runner/_work/sim/sim/apps/sim/stores/workflows/subblock/store.ts. Does the file exist?
 ❯ loadAndTransform ../../node_modules/.bun/vite@7.1.7+86ae5022d96aecae/node_modules/vite/dist/node/chunks/dep-Bm2ujbhY.js:26109:33


error: script "test" exited with code 1
[ERROR] command finished with error: command (/home/runner/_work/sim/sim/apps/sim) /home/runner/.bun/bin/bun run test exited (1)

 Tasks:    1 successful, 2 total
Cached:    0 cached, 2 total
  Time:    10.102s 
Failed:    sim#test

Error: Process completed with exit code 1.

Summary: 1 failed workflow
  • CI (1 job failed)

Last updated: 2025-10-10 23:37:32 UTC

waleedlatif1 added a commit that referenced this pull request Oct 11, 2025
* improvement(kb): encode non-ASCII headers for kb uploads

* cleanup

* increase timeouts to match trigger
@waleedlatif1 waleedlatif1 deleted the improvement/kb branch October 13, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants