Vision #249

bdashore3 · 2024-11-22T19:27:26Z

Bypass template:

Adds support for vision models using Exl2

HuggingFace separated the chat template in the newest transformers versions. Signed-off-by: kingbri <bdashore3@proton.me>

Adds the ability to load vision parts of text + image models. Requires an explicit flag in config because there isn't a way to automatically determine whether the vision tower should be used. Signed-off-by: kingbri <bdashore3@proton.me>

* Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings

* More robust checks for OAI chat completion message lists on /v1/encode endpoint * Added TODO to support other aspects of chat completions * Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint

* When vision is not enabled, only the first text block is kept in message.content if it is a list

Previously, the messages were a list of dicts. These are untyped and don't provide strict hinting. Add types for chat completion messages and reformat existing code. Signed-off-by: kingbri <bdashore3@proton.me>

Migrate the add method into the class itself. Also, a BaseModel isn't needed here since this isn't a serialized class. Signed-off-by: kingbri <bdashore3@proton.me>

Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>

Mistake in unwrapping. Vision should be false to allow normal model loading when the flag isn't provided. Signed-off-by: kingbri <bdashore3@proton.me>

The model_type internal reference was changed to an enum for a more extendable loading process. Return the current model type when loading a new model. Signed-off-by: kingbri <bdashore3@proton.me>

If vision is enabled and the model doesn't support it, send an error asking the user to reload. Also, add a method to unload the vision tower. Signed-off-by: kingbri <bdashore3@proton.me>

The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>

bdashore3 and others added 15 commits November 11, 2024 12:10

Model: Add support for chat_template.json

cc25167

HuggingFace separated the chat template in the newest transformers versions. Signed-off-by: kingbri <bdashore3@proton.me>

Model: Add vision loading support

69ac0eb

Adds the ability to load vision parts of text + image models. Requires an explicit flag in config because there isn't a way to automatically determine whether the vision tower should be used. Signed-off-by: kingbri <bdashore3@proton.me>

Vision: Define basic utils for ExLlamaV2 vision

5fa298e

OAI: Initial vision support in OAI chat completions

dd41eec

* Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings

Config: Add option to disable fetching content from URLs

c426553

API: Report whether vision is enabled

27d9af5

OAI: Keep behavior consistent between chat completion and encode

731a345

* When vision is not enabled, only the first text block is kept in message.content if it is a list

Merge branch 'main' into vision

0fadb1e

OAI: Strictly type chat completions

8ffc636

Previously, the messages were a list of dicts. These are untyped and don't provide strict hinting. Add types for chat completion messages and reformat existing code. Signed-off-by: kingbri <bdashore3@proton.me>

API: Transform multimodal into an actual class

c652a6e

Migrate the add method into the class itself. Also, a BaseModel isn't needed here since this isn't a serialized class. Signed-off-by: kingbri <bdashore3@proton.me>

Model: Set vision load to False by default

0ab393f

Mistake in unwrapping. Vision should be false to allow normal model loading when the flag isn't provided. Signed-off-by: kingbri <bdashore3@proton.me>

Model: Fix load packets

c49047e

The model_type internal reference was changed to an enum for a more extendable loading process. Return the current model type when loading a new model. Signed-off-by: kingbri <bdashore3@proton.me>

Model: Add unload and error messages for vision

eadc71a

If vision is enabled and the model doesn't support it, send an error asking the user to reload. Also, add a method to unload the vision tower. Signed-off-by: kingbri <bdashore3@proton.me>

bdashore3 requested a review from DocShotgun November 22, 2024 19:27

OAI: Fix chat completion list parsing

388d36e

The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>

DocShotgun approved these changes Nov 22, 2024

View reviewed changes

bdashore3 merged commit 9c8186c into main Nov 22, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision #249

Vision #249

bdashore3 commented Nov 22, 2024

Vision #249

Vision #249

Conversation

bdashore3 commented Nov 22, 2024