Skip to content

Bug: transformers.js doesn't respect proxy settings - blocks users behind proxy/firewall #52

@yingcheng1026

Description

@yingcheng1026

Bug Report: transformers.js doesn't respect proxy settings - blocks users behind proxy/firewall

Summary

episodic-memory cannot index conversations when HuggingFace.co is not directly accessible, even when:

  • Model files are already cached locally
  • System proxy is properly configured
  • curl can access HuggingFace through proxy

The root cause is that @xenova/transformers (v2.17.2) doesn't respect standard proxy environment variables, making episodic-memory unusable for users behind corporate firewalls or in regions requiring proxy access.


Environment

  • episodic-memory version: 1.0.15
  • Node.js version: v24.12.0
  • npm version: 11.6.2
  • OS: macOS 15.3 (arm64)
  • @xenova/transformers version: ^2.17.2
  • Proxy: Clash Verge (127.0.0.1:7897)

Reproduction Steps

  1. Install episodic-memory plugin in Claude Code
  2. Set system proxy on macOS:
    networksetup -setwebproxy Wi-Fi 127.0.0.1 7897
    networksetup -setsecurewebproxy Wi-Fi 127.0.0.1 7897
  3. Try to sync/index conversations:
    episodic-memory sync
  4. Observe connection timeout to huggingface.co

Expected Behavior

Either of the following:

  1. Transformers.js should respect standard proxy environment variables (HTTP_PROXY, HTTPS_PROXY)
  2. Transformers.js should use offline mode when models are already cached locally
  3. episodic-memory should provide a way to configure proxy settings for transformers.js

Actual Behavior

Even with:

  • ✅ Model files cached in ~/.cache/huggingface/hub/models--Xenova--all-MiniLM-L6-v2/
  • ✅ System proxy properly configured
  • ✅ curl successfully accessing huggingface.co through proxy

The sync/index command still fails with:

Error: ConnectTimeoutError: Connect Timeout Error (attempted address: huggingface.co:443, timeout: 10000ms)

Troubleshooting Attempts

Attempt 1: Standard Proxy Environment Variables

export HTTP_PROXY=http://127.0.0.1:7897
export HTTPS_PROXY=http://127.0.0.1:7897
export all_proxy=socks5://127.0.0.1:7897
episodic-memory sync

Result: ❌ Still tries to connect directly, ignores proxy settings

Attempt 2: Global Agent

export GLOBAL_AGENT_HTTP_PROXY=http://127.0.0.1:7897
export GLOBAL_AGENT_HTTPS_PROXY=http://127.0.0.1:7897
episodic-memory sync

Result: ❌ No effect, transformers.js doesn't use global-agent

Attempt 3: Offline Mode

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
episodic-memory sync

Result: ❌ Still attempts to connect to huggingface.co for validation

Attempt 4: Disable System Proxy

networksetup -setwebproxystate Wi-Fi off
networksetup -setsecurewebproxystate Wi-Fi off
curl -I https://huggingface.co  # Works fine
episodic-memory sync

Result: ❌ Node.js fetch still times out, even though curl works


Verification

Model Cache Status

$ ls -lh ~/.cache/huggingface/hub/models--Xenova--all-MiniLM-L6-v2/snapshots/*/model_quantized.onnx
-rw-r--r--  22M Jan 31 00:24 model_quantized.onnx

✅ Model file (22MB) is fully downloaded and cached locally

Proxy Connectivity

$ curl -x http://127.0.0.1:7897 -I https://huggingface.co
HTTP/1.1 200 Connection established
HTTP/2 200

✅ Proxy works perfectly with curl

Direct Connectivity (without proxy)

$ curl -I https://huggingface.co
HTTP/1.1 200 Connection established
HTTP/2 200

✅ Even direct connection works with curl, but Node.js fetch still fails


Root Cause Analysis

The issue is in @xenova/transformers library:

  • It uses fetch API internally to download models and validate
  • The fetch implementation in Node.js (via undici) doesn't automatically use proxy settings from environment variables
  • transformers.js doesn't provide a way to configure HTTP proxy
  • Even with models cached locally, it still tries to connect to huggingface.co for validation

This is a known limitation discussed in:


Impact

Critical: Users in regions requiring proxy access (China, corporate networks, etc.) cannot use episodic-memory at all.

The plugin successfully syncs conversation files to the archive, but search/indexing is completely broken.


Suggested Solutions

Option 1: Add Proxy Support to transformers.js

Upstream fix needed in @xenova/transformers:

  • Respect HTTP_PROXY and HTTPS_PROXY environment variables
  • Provide a configuration option to set proxy programmatically
  • Use proxy-agent or similar package to handle proxy connections

Option 2: Offline Mode for episodic-memory

Add configuration to episodic-memory:

  • Skip model validation when models are cached
  • Add --offline flag to skip network calls entirely
  • Provide explicit error messages when network is optional

Option 3: Alternative Embedding Strategy

  • Use a different embedding library that supports proxies
  • Allow users to provide custom embedding function
  • Support external embedding services (OpenAI, etc.)

Additional Context

This is not a new issue - many transformer-based tools face the same problem. The transformers.js community has been discussing this for years, but no solution has been implemented yet.

For episodic-memory to be truly useful for global users, proxy support is essential, not optional.


References

  • Related transformers.js issue: (check if there's an existing issue about proxy support)
  • Similar issues in other projects using transformers.js

Priority

🔴 High Priority - Blocks entire functionality for significant user base


Workaround (Temporary)

Currently, there is no working workaround. Users behind proxies cannot use episodic-memory's search functionality.


Thank you for looking into this! I'm happy to provide additional logs or test potential fixes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions