Releases: Liquid4All/on-prem-stack
LFM-3B-JP v0.0.1
Summary
This is the stack for LFM-3B-JP.
How to run for the first time
- Download
Source code (zip)
below. - Unzip the file into an empty folder.
- Run
launch.sh
.
Models
Currently, each on-prem stack can only run one model at a time. The launch script runs lfm-3b-jp
by default. To switch models, change MODEL_NAME
and MODEL_IMAGE
in the .env
file according to table below, and run ./launch.sh
again.
Model Name | Model Image |
---|---|
lfm-3b-jp |
liquidai/lfm-3b-jp:0.0.1-e |
lfm-3b-ichikara |
liquidai/lfm-3b-ichikara:0.0.1-e |
Update
To update the stack, change STACK_VERSION
and MODEL_IMAGE
in the .env
file and run the launch script again.
How to test
- After running
launch.sh
, wait up to 2 min for model initialization, and runtest-api.sh
.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000
and chat with the model in a web UI.
LFM-1B-6GB @ v0.0.1
Summary
This is the stack for LFM-1B that can run with 6GB GPU memory.
How to run for the first time
- Download
Source code (zip)
below. - Unzip the file into an empty folder.
- Run
launch.sh
.
How to upgrade
In .env
, make these updates:
Variable | Value |
---|---|
STACK_VERSION |
2685ff757d |
MODEL_IMAGE |
liquidai/lfm-1be:0.0.1 |
In docker-compose.yaml
, make these changes:
Argument | Value |
---|---|
--max-model-len |
"2048" |
--max-seq-len-to-capture |
"2048" |
--max-num-seqs |
"100" |
Then run launch.sh
.
If the model container is throwing out-of-memory error, further decrease these arguments, keep max-seq-len-to-capture
the same as max-model-len
, and run launch.sh
to retry.
How to test
- After running
launch.sh
, wait up to 2 min for model initialization, and runtest-api.sh
.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000
and chat with the model in a web UI.
v0.0.3
How to run for the first time
- Download
Source code (zip)
below. - Unzip the file into an empty folder.
- Run
launch.sh
.
How to upgrade
- Download
Source code (zip)
below. - Unzip the file into the current deployment folder, overwriting all existing files.
- Make sure to keep the existing
.env
file. - In the
.env
file, update theSTACK_VERSION
to2b3f969864
, andMODEL_IMAGE
toliquidai/lfm-3be:0.0.6
.- Please note that all previous versions have been removed.
- Run
launch.sh
.
How to test
- After running
launch.sh
, wait up to 2 min for model initialization, and runtest-api.sh
.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000
and chat with the model in a web UI.
What's Changed
- More robust Python backend.
- Updated 3B model.
Full Changelog: 0.0.2...0.0.3
v0.0.2
How to run for the first time
- Download
Source code (zip)
below. - Unzip the file into an empty folder.
- Run
launch.sh
.
How to upgrade
- Download
Source code (zip)
below. - Unzip the file into the current deployment folder, overwriting all existing files.
- Make sure to keep the existing
.env
file. - Update the
STACK_VERSION
in the.env
file todf7136eba0
. - Run "launch.sh".
How to test
- After running
launch.sh
, wait up to 2 min for model initialization, and runtest-api.sh
.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000
and chat with the model in a web UI.
What's Changed
- The 3B model can now run on one A100 GPU with 40 GB memory.
- Previously, it requires one H100 GPU with 80 GB memory, due to incorrect VLLM parameters.
Full Changelog: 0.0.1...0.0.2
v0.0.1
How to run for the first time
- Download the source code zip file.
- Unzip the file into an empty folder.
- Run
launch.sh
. - Wait for 2 min, and run "test-api.sh". This script will trigger a smoke test to verify that the inference server is running correctly.
How to upgrade
- Download the source code zip file.
- Unzip the file into the current deployment folder, overwriting all existing files.
- Make sure to keep the existing
.env
file. - Update the
STACK_VERSION
in the.env
file toe72afb3ab9
. - Run "launch.sh".
- Wait for 2 min, and run "test-api.sh". This script will trigger a smoke test to verify that the inference server is running correctly.