-
Notifications
You must be signed in to change notification settings - Fork 3.3k
improvement(kb): encode non-ASCII headers for kb uploads #1595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR fixes knowledge base upload issues with non-ASCII filenames by implementing RFC 5987 encoding in the Content-Disposition header. The core fix adds encodeFilenameForHeader() which properly encodes international characters using UTF-8 encoding with a fallback to ASCII-safe names.
Key changes:
- Added RFC 5987 encoding for Content-Disposition headers to support non-ASCII filenames (e.g., Chinese, Japanese, Arabic characters)
- Updated file serve endpoint to use hybrid authentication (session/API key/internal JWT) instead of session-only
- Added internal JWT authentication for document processor to serve API communication
- Increased KB processing timeouts from 300s to 600s and embedding batch size from 50 to 500 for better performance
- Fixed OCR file handling to exclude internal serve paths from external HTTPS detection
Confidence Score: 4/5
- This PR is generally safe to merge with proper testing of the authentication changes
- The core filename encoding fix is solid and follows RFC 5987 standards. However, the authentication changes from session-only to hybrid auth in the serve endpoint represent a significant security model change that needs verification. The internal JWT implementation appears sound, but the broader implications of allowing internal JWT access to all files (not just KB files) should be validated. The timeout and batch size increases are reasonable performance improvements.
- Pay close attention to
apps/sim/app/api/files/serve/[...path]/route.ts- verify that hybrid auth correctly restricts access and that internal JWT authentication is properly scoped
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| apps/sim/app/api/files/utils.ts | 5/5 | Added encodeFilenameForHeader function to properly encode non-ASCII characters in Content-Disposition headers using RFC 5987 encoding, fixing KB upload issues with international characters |
| apps/sim/app/api/files/serve/[...path]/route.ts | 4/5 | Switched from session-only auth to hybrid auth (session/API key/internal JWT), removed execution file logging - changes improve auth flexibility but may have broader implications |
| apps/sim/lib/knowledge/documents/document-processor.ts | 5/5 | Added internal token authentication for internal file serve requests, fixed OCR detection to exclude internal serve paths from external HTTPS check |
Sequence Diagram
sequenceDiagram
participant C as Client
participant U as Upload Route
participant S as Storage
participant F as Serve Route
participant P as Doc Processor
participant A as Auth
C->>U: Upload file
U->>S: Store file
S-->>U: File path
U-->>C: Success
P->>P: Start processing
P->>F: Fetch file
F->>A: Verify
A-->>F: OK
F->>S: Get file
S-->>F: Buffer
F->>F: Encode header
F-->>P: File data
P->>P: Process
7 files reviewed, 2 comments
f3bbadb to
2957b19
Compare
|
1 Job Failed: CI / Test and Build / Test and Build failed on "Run tests with coverage" |
* improvement(kb): encode non-ASCII headers for kb uploads * cleanup * increase timeouts to match trigger
Summary
encode non-ASCII headers for kb uploads
Type of Change
Testing
Tested manually.
Checklist