An API which mocks Llama.cpp to enable support for Codestral Mamba with the Continue Visual Studio Code extension.
As of the time of writing and to my knowledge, this is the only way to use Codestral Mamba with VSCode locally. To make it work, we implement the /completion
REST API from Llama.cpp's HTTP server and configure Continue for VSCode to use our server instead of Llama.cpp's. This way we handle all inference requests from Continue instead of Llama.cpp. When we get a request, we simply pass it off to mistral-inference which runs Continue's request with Codestral Mamba. Platform support is available wherever mistral-inference can be run.
Now let's get started!
Prerequisites:
- Download and run Codestral Mamba with mistral-inference (Ref 1 & 2 & 3)
- Install the Continue VSCode extension
After you are able to use both independently, we will glue them together with Codestral Mamba for VSCode.
Steps:
- Install Flask to your mistral-inference environment with
pip install flask
. - Run
llamacpp_mock_api.py
withpython llamacpp_mock_api.py <path_to_codestral_folder_here>
under your mistral-inference environment. - Click the settings button at the bottom right of Continue's UI in VSCode and make changes to
config.json
so it looks like this[archive]. ReplaceMODEL_NAME
withmistral-8x7b
.
Restart VSCode or reload the Continue extension and you should now be able to use Codestral Mamba for VSCode!