Skip to content

Releases: Liquid4All/on-prem-stack

LFM-3B-JP v0.0.1

18 Dec 03:16
Compare
Choose a tag to compare

Summary

This is the stack for LFM-3B-JP.

How to run for the first time

  • Download Source code (zip) below.
  • Unzip the file into an empty folder.
  • Run launch.sh.

Models

Currently, each on-prem stack can only run one model at a time. The launch script runs lfm-3b-jp by default. To switch models, change MODEL_NAME and MODEL_IMAGE in the .env file according to table below, and run ./launch.sh again.

Model Name Model Image
lfm-3b-jp liquidai/lfm-3b-jp:0.0.1-e
lfm-3b-ichikara liquidai/lfm-3b-ichikara:0.0.1-e

Update

To update the stack, change STACK_VERSION and MODEL_IMAGE in the .env file and run the launch script again.

How to test

  • After running launch.sh, wait up to 2 min for model initialization, and run test-api.sh.
    • This script will trigger a smoke test to verify that the inference server is running correctly.
  • Visit 0.0.0.0:3000 and chat with the model in a web UI.

LFM-1B-6GB @ v0.0.1

11 Dec 09:35
Compare
Choose a tag to compare

Summary

This is the stack for LFM-1B that can run with 6GB GPU memory.

How to run for the first time

  • Download Source code (zip) below.
  • Unzip the file into an empty folder.
  • Run launch.sh.

How to upgrade

In .env, make these updates:

Variable Value
STACK_VERSION 2685ff757d
MODEL_IMAGE liquidai/lfm-1be:0.0.1

In docker-compose.yaml, make these changes:

Argument Value
--max-model-len "2048"
--max-seq-len-to-capture "2048"
--max-num-seqs "100"

Then run launch.sh.

If the model container is throwing out-of-memory error, further decrease these arguments, keep max-seq-len-to-capture the same as max-model-len, and run launch.sh to retry.

How to test

  • After running launch.sh, wait up to 2 min for model initialization, and run test-api.sh.
    • This script will trigger a smoke test to verify that the inference server is running correctly.
  • Visit 0.0.0.0:3000 and chat with the model in a web UI.

v0.0.3

26 Nov 10:30
Compare
Choose a tag to compare

How to run for the first time

  • Download Source code (zip) below.
  • Unzip the file into an empty folder.
  • Run launch.sh.

How to upgrade

  • Download Source code (zip) below.
  • Unzip the file into the current deployment folder, overwriting all existing files.
  • Make sure to keep the existing .env file.
  • In the .env file, update the STACK_VERSION to 2b3f969864, and MODEL_IMAGE to liquidai/lfm-3be:0.0.6.
    • Please note that all previous versions have been removed.
  • Run launch.sh.

How to test

  • After running launch.sh, wait up to 2 min for model initialization, and run test-api.sh.
    • This script will trigger a smoke test to verify that the inference server is running correctly.
  • Visit 0.0.0.0:3000 and chat with the model in a web UI.

What's Changed

  • More robust Python backend.
  • Updated 3B model.

Full Changelog: 0.0.2...0.0.3

v0.0.2

02 Nov 21:35
Compare
Choose a tag to compare

How to run for the first time

  • Download Source code (zip) below.
  • Unzip the file into an empty folder.
  • Run launch.sh.

How to upgrade

  • Download Source code (zip) below.
  • Unzip the file into the current deployment folder, overwriting all existing files.
  • Make sure to keep the existing .env file.
  • Update the STACK_VERSION in the .env file to df7136eba0.
  • Run "launch.sh".

How to test

  • After running launch.sh, wait up to 2 min for model initialization, and run test-api.sh.
    • This script will trigger a smoke test to verify that the inference server is running correctly.
  • Visit 0.0.0.0:3000 and chat with the model in a web UI.

What's Changed

  • The 3B model can now run on one A100 GPU with 40 GB memory.
    • Previously, it requires one H100 GPU with 80 GB memory, due to incorrect VLLM parameters.

Full Changelog: 0.0.1...0.0.2

v0.0.1

31 Oct 05:55
Compare
Choose a tag to compare

How to run for the first time

  • Download the source code zip file.
  • Unzip the file into an empty folder.
  • Run launch.sh.
  • Wait for 2 min, and run "test-api.sh". This script will trigger a smoke test to verify that the inference server is running correctly.

How to upgrade

  • Download the source code zip file.
  • Unzip the file into the current deployment folder, overwriting all existing files.
  • Make sure to keep the existing .env file.
  • Update the STACK_VERSION in the .env file to e72afb3ab9.
  • Run "launch.sh".
  • Wait for 2 min, and run "test-api.sh". This script will trigger a smoke test to verify that the inference server is running correctly.