Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for exllama / self-hosted inference engines #854

Closed
sundaraa-deshaw opened this issue Nov 21, 2023 · 2 comments
Closed

Support for exllama / self-hosted inference engines #854

sundaraa-deshaw opened this issue Nov 21, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@sundaraa-deshaw
Copy link

Please describe the feature you want
Tabby model spec, https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md, says it supports only the .gguf files consumed by llama.cpp inference engine.

It will be good to be able to support other faster (on GPU) inference engines like exllama[v2].

Self-hosting a LLM usually means running a HTTP or WS based backend that runs off these engines (besides llama.cpp).

Are there ways to currently serve Tabby off such an engine? I checked the source and it seems only "vertex-ai", "fastchat" are supported to talk to an external API.

Additional context
Add any other context or screenshots about the feature request here.


Please reply with a 👍 if you want this feature.

@sundaraa-deshaw sundaraa-deshaw added the enhancement New feature or request label Nov 21, 2023
@te0006
Copy link

te0006 commented Nov 21, 2023

https://github.com/mudler/LocalAI would be a very useful inference backend as well. It supports tons of open-source LLMs, is compatible to the OpenAI API, is able to switch models dynamically, can be configured to do CPU, GPU, and mixed execution, has official docker images and is MIT-licensed.

@wsxiaoys
Copy link
Member

Close as dup of #795 (let’s continue the disucssion there)

@wsxiaoys wsxiaoys closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants