Skip to content

Conversation

@grahamking
Copy link
Contributor

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
@grahamking grahamking changed the title Cherrypick fixed context length, and openmp dependency fix: Cherrypick fixed context length, and openmp dependency Jun 2, 2025
@github-actions github-actions bot added the fix label Jun 2, 2025
@grahamking grahamking enabled auto-merge (squash) June 2, 2025 20:52
@nv-anants nv-anants disabled auto-merge June 3, 2025 13:06
@nv-anants nv-anants merged commit 63ef24f into release/0.3.0 Jun 3, 2025
15 checks passed
@nv-anants nv-anants deleted the gk-cp-1 branch June 3, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants