Description
At the moment, the process by which a user can build a custom python engine to deploy a model via cortex is not straightforward in code or clear in the docs. The plan is:
- improve the process for building a custom python engine
- remove unnecessary parameters in the
model.yml
config - improve the documentation
- add examples of different custom engines to the docs
Goals
- vllm is python lib to run on linux machine
- vllm can run llm
Tasks
Obstacle
- huge effort to maintain this in cortex
Out of scope
- Audio