Bloomz Inference Server

This project implements an inference server for Bloomz Large Language Models (LLMs). The server allows for efficient and scalable inference of Bloomz models, making it suitable for various NLP applications.

Features

Flexible: Easily configurable to support different model sizes and configurations.
REST API: Provides a RESTful API for easy integration with other applications.

Installation

Prerequisites

Python 3.10 or higher
poetry package manager

Clone the Repository

git clone <github.com>
cd llm-inference

Install Dependencies

poetry install

Usage

Configuration

Before running the server, you need to configure it and set some environment variables.

HUGGING_FACE_HUB_TOKEN="<YOUR HF HUB TOKEN>"

Running the Server

For each kind of task goes a specific model here are some models based on bloomz architecture, Open Sourced that you could use you will find the latest model on Credit Mutuel Arkea's hugginface ODQA collection.

If you are using PyCharm find run configuration in .run/**.yaml they should appear direcly in PyCharm, those configurations uses the samllest models.

Embedding server

Use to vectorise documents search for *-retriever models, then start the inference server like this (smallest model) :

python -m llm_inference --task EMBEDDING --port 8081 --model cmarkea/bloomz-560m-retriever-v2

Then go to http://localhost:8081/docs.

Reranking / Scoring server

Use to rank severeal context according to a specific query, search for *-reranking models, then start the inference server like this (smallest model) :

python -m llm_inference --task SCORING --port 8082 --model cmarkea/bloomz-560m-reranking

Then go to http://localhost:8082/docs.

Be aware to check the examples in the model card depending on the model you use to understand the meaning of the output labels. For instance for cmarkea/bloomz-560m-reranking, LABEL1 near to 1 means that the context in really similar to the query, as described in the model card-,contexts_reranked,-%3D%20sorted().

Guardrail

Use to detect responses that would be toxic for instance : insult, obscene, sexual_explicit, identity_attack... Our guardrail models are published under *-guardrail models

python -m llm_inference --task GUARDRAIL --port 8083 --model cmarkea/bloomz-560m-guardrail

Then go to http://localhost:8083/docs.

API Endpoints

You can access server documentation through this endpoint : /docs

Testing

To run the tests, use the following command:

pytest tests/

Contributing

We welcome contributions to this project! Please open an issue or submit a pull request on GitHub.

Guidelines

Fork the repository.
Create a new branch for your feature or bugfix.
Make your changes.
Ensure all tests pass.
Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments

BigScience for providing the pre-trained models.
Crédit Mutuel Arkéa for supporting this project.

Contact

For any inquiries or support, open an issue on this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
.run		.run
llm_inference		llm_inference
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bloomz Inference Server

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Usage

Configuration

Running the Server

Embedding server

Reranking / Scoring server

Guardrail

API Endpoints

Testing

Contributing

Guidelines

License

Acknowledgments

Contact

About

Releases

Packages

Contributors 6

Languages

License

CreditMutuelArkea/llm-inference

Folders and files

Latest commit

History

Repository files navigation

Bloomz Inference Server

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Usage

Configuration

Running the Server

Embedding server

Reranking / Scoring server

Guardrail

API Endpoints

Testing

Contributing

Guidelines

License

Acknowledgments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages