This project implements an inference server for Bloomz Large Language Models (LLMs). The server allows for efficient and scalable inference of Bloomz models, making it suitable for various NLP applications.
- Flexible: Easily configurable to support different model sizes and configurations.
- REST API: Provides a RESTful API for easy integration with other applications.
- Python 3.10 or higher
poetry
package manager
git clone <github.com>
cd llm-inference
poetry install
Before running the server, you need to configure it and set some environment variables.
HUGGING_FACE_HUB_TOKEN="<YOUR HF HUB TOKEN>"
For each kind of task goes a specific model here are some models based on bloomz architecture, Open Sourced that you could use you will find the latest model on Credit Mutuel Arkea's hugginface ODQA collection.
If you are using PyCharm find run configuration in .run/**.yaml
they should appear direcly in PyCharm, those configurations uses the samllest models.
Use to vectorise documents search for *-retriever
models, then start the inference server like this (smallest model) :
python -m llm_inference --task EMBEDDING --port 8081 --model cmarkea/bloomz-560m-retriever-v2
Then go to http://localhost:8081/docs.
Use to rank severeal context according to a specific query, search for *-reranking
models, then start the inference server like this (smallest model) :
python -m llm_inference --task SCORING --port 8082 --model cmarkea/bloomz-560m-reranking
Then go to http://localhost:8082/docs.
Be aware to check the examples in the model card depending on the model you use to understand the meaning of the output labels.
For instance for cmarkea/bloomz-560m-reranking, LABEL1
near to 1 means that the context in really similar to the query, as described in the model card-,contexts_reranked,-%3D%20sorted().
Use to detect responses that would be toxic for instance : insult, obscene, sexual_explicit, identity_attack...
Our guardrail models are published under *-guardrail
models
python -m llm_inference --task GUARDRAIL --port 8083 --model cmarkea/bloomz-560m-guardrail
Then go to http://localhost:8083/docs.
You can access server documentation through this endpoint : /docs
To run the tests, use the following command:
pytest tests/
We welcome contributions to this project! Please open an issue or submit a pull request on GitHub.
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Make your changes.
- Ensure all tests pass.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.
- BigScience for providing the pre-trained models.
- Crédit Mutuel Arkéa for supporting this project.
For any inquiries or support, open an issue on this repository.