Skip to content

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020

Closed
@freelerobot

Description

@freelerobot

Describe the bug

  1. install cortex
  2. start server
  3. cortex run llama3:tensorrt-llm --chat
  4. NOTE: tensorrt-LLM branch doesn't exist in llama3 hf repo
  5. model successfully downloads, but binary is empty,but there is a model.yaml
  6. But when running, get issue:
(base) PS C:\Windows\System32> cortex run llama3:tensorrt-llm --chat
√ Dependencies loaded in 862ms
√ API server is online
√ Model found
Downloading engine...
 ████████████████████████████████████████ 100% | ETA: 0s | 100/100
× 500 status code (no body)
Last errors:
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
- Loading model...
20240815 15:29:47.151000 UTC 10740 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 1| avx512_dq = 1| avx512_ifma = 1| avx512_pf = 0| avx512_er = 0| avx512_cd = 1| avx512_bw = 1| has_avx512_vl = 1| has_avx512_vbmi = 1| has_avx512_vbmi2 = 1| avx512_vnni = 1| avx512_bitalg = 1| avx512_vpopcntdq = 1| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:288
20240815 15:29:47.151000 UTC 10740 ERROR Could not load engine: Could not load library "C:\Users\n\cortex/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:299
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
...

Turns out. It somehow downloaded an empty model instead of just failing.

ah i see the issue. tensorrt-llm is an invalid tag (so cortex.so/models is terribly wrong)
and cortex run llama3:tensorrt-llm downloaded a default empty model
there's no hf repo branch called tensorrt-llm.

(base) PS C:\Users\n\cortex\models> cat .\llama3-tensorrt-llm.yaml
files:
  - C:\Users\n\cortex\models\llama3-tensorrt-llm\.gitattributes
model: llama3:tensorrt-llm
name: llama3:tensorrt-llm
stop: []
stream: true
max_tokens: 4096
frequency_penalty: 0.7
presence_penalty: 0.7
temperature: 0.7
top_p: 0.7
ctx_len: 4096
ngl: 100
engine: cortex.llamacpp
id: llama3:tensorrt-llm
created: 1723735451386
object: model
owned_by: ''

Specs:

  • windows, RTX4070 , latest cuda/nvidia
  • cortex v0.5.0 - 44

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Discontinued

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions