All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed issue with static build in docker not working.
- Integrated CUDA functonality from llama.cpp upstream which accelerates inference for long prompts
- Added multi-threaded server support which should prevent health checks aimed at
GET /
from failing during prediction. - Separated autocomplete lambda into a separate C++ function so that it can be bound to
/v1/completions
,/v1/engines/copilot-codex/completions
and/v1/engines/codegen/completions
- Removed
model
from completion input as required param which stops the official copilot plugin from freaking out - Integrate latest changes from upstream ggml including some fixes for ARM NEON processor
- Added Mac builds as part of CI
- Support for fork of vscode-fauxpilot with a progress indicator is now available (PR is open upstream, please react/vote for it).
- Added 350M parameter codegen model to Google Drive folder
- Added multi-arch docker images so that users can now directly run on Apple silicon and even raspberry pi
- Now support pre-tokenized inputs passed into the API from a Python tokenizer (thanks to @thakkarparth007 for their PR - ravenscroftj/ggml#2)
- Project now builds on Mac OS (Thanks to @Dimitrije-V for their PR ravenscroftj/ggml#1 and @dabdine for contributing some clearer Mac build instructions)
- Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set by @swanserquack in #5
- Improve performance of model by incorporating changes to GGML library from @ggerganov
- Turbopilot is born!