The Embedding Models module is responsible for managing the embedding models.
Currently, the framework supports the following embedding models:
- text-embedding-large / small (remote - OpenAI API)
- Salesforce/SFR-Embedding-Mistral (local - GPU with 32GB VRAM recommended, model size is roughly 26GB)
- intfloat/e5-mistral-7b-instruct (local - GPU with 32GB VRAM recommended, model size is roughly 26GB)
- Alibaba-NLP/gte-Qwen1.5-7B-instruct (local - GPU with 32GB VRAM recommended, model size is roughly 26GB)
- dunzhang/stella_en_1.5B_v5 (local - GPU with 12GB VRAM recommended, model size is roughly 6GB)
- dunzhang/stella_en_400M_v5 (local - GPU with 4GB VRAM recommended, model size is roughly 2GB)
The following sections describe how to instantiate individual models and how to add new models to the framework.
- Create a copy of
config_template.json
namedconfig.json
in the CheckEmbed folder. (Not necessary for local models) - Fill configuration details based on the used model (below).
- Adjust the predefined
gpt-embedding-large
orgpt-embedding-small
configurations or create a new configuration with an unique key.
Key | Value |
---|---|
model_id | Model name based on OpenAI model overview. |
name | Name used for CheckEmbed output files. We suggest to use the default names for local models. |
token_cost | Price per 1000 tokens based on OpenAI pricing, used for calculating cumulative price per LLM instance. |
encoding | String indicating the format to return the embeddings in. Can be either float or base64. More information can be found in the OpenAI API reference. |
dimension | Number indicating output dimension for the embedding model. More information can be found in the OpenAI model overview. |
organization | Organization to use for the API requests (may be empty). |
api_key | Personal API key that will be used to access OpenAI API. |
- Instantiate the embedding model based on the selected configuration key (predefined / custom).
max_concurrent_request
is by default 10. Adjust the value based on your tier rate limits.
embedding_lm = language_models.EmbeddingGPT(
config_path,
model_name = <configuration-key>,
cache = <False | True>,
max_concurrent_requests = <int number>
)
The framework currently supports the following local models: Salesforce/SFR-Embedding-Mistral
, intfloat/e5-mistral-7b-instruct
, Alibaba-NLP/gte-Qwen1.5-7B-instruct
, dunzhang/stella_en_1.5B_v5
and dunzhang/stella_en_400M_v5
.
- Instantiate the embedding model based on the owned device.
- Device can be specified in the
Scheduler
, more here
sfrEmbeddingMistral = language_models.SFREmbeddingMistral(
model_name = "Salesforce/SFR-Embedding-Mistral",
cache = False,
batch_size = 64,
)
e5mistral7b = language_models.E5Mistral7b(
model_name = "intfloat/e5-mistral-7b-instruct",
cache = False,
batch_size = 64,
)
gteQwen157bInstruct = language_models.GteQwenInstruct(
model_name = "Alibaba-NLP/gte-Qwen1.5-7B-instruct",
cache = False,
access_token = "", # Add your access token here (Hugging Face)
batch_size = 1, # Unless you have more than 32GB of GPU VRAM at your disposal use 1.
)
stella_en_15B_v5 = embedding_models.Stella(
model_name = "dunzhang/stella_en_1.5B_v5",
cache = False,
batch_size = 64,
)
stella_en_400M_v5 = embedding_models.Stella(
model_name = "dunzhang/stella_en_400M_v5",
cache = False,
batch_size = 64,
)
More embedding models can be added by following these steps:
- Create new class as a subclass of
AbstractEmbeddingModel
. - Use the constructor for loading the configuration and instantiating the embedding model (if needed).
class CustomLanguageModel(AbstractEmbeddingModel):
def __init__(
self,
config_path: str = "",
model_name: str = "text-embedding-large",
name: str = "CustomLanguageModel",
cache: bool = False
) -> None:
super().__init__(config_path, model_name, name, cache)
self.config: Dict = self.config[model_name]
# Load data from configuration into variables if needed
# Instantiate model if needed
- Implement the
generate_embedding
abstract method that is used to get a list of embeddings from the model (remote API call or local model inference).
def generate_embedding(
self,
input: Union[List[str], str]
) -> List[float]:
# Call model and retrieve an embedding
# Return model response