Skip to content

koboldcpp-1.39.1

Compare
Choose a tag to compare
@LostRuins LostRuins released this 07 Aug 07:36
· 3998 commits to concedo since this release

koboldcpp-1.39.1

  • Fix SSE streaming to handle headers correctly during abort (Credits: @duncannah)
  • Bugfix for --blasbatchsize -1 and 1024 (fix alloc blocks error)
  • Added experimental support for --blasbatchsize 2048 (note, buffers are doubled if that is selected, using much more memory)
  • Added support for 12k and 16k --contextsize options. Please let me know if you encounter issues.
  • Pulled upstream improvements, further CUDA speedups for MMQ mode for all quant types.
  • Fix for some LLAMA 65B models being detected as LLAMA2 70B models.
  • Revert to upstream approach for CUDA pool malloc (1.39.1 - done only for MMQ).
  • Updated Lite, includes adding support for importing Tavern V2 card formats, with world info (character book) and clearer settings edit boxes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.