Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tjluyao committed Jul 9, 2024
1 parent 4766eab commit 2c28c21
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,25 @@ make install
`Dockerfile_kvrun` provides a docker image building script. We will provide pre-built docker images shortly.

## Usages
#### Deploy services
```shell
text-generation-launcher --model-id tjluyao/llama-3-8b
```
You can use `--disable-flashinfer` to force a classic TGI serving.

#### Deploy services
```shell
text-generation-launcher --model-id tjluyao/llama-3-8b
```
You can use `--disable-flashinfer` to force a classic TGI serving.

#### Query the model
You can query the model either through `curl`:
```shell
curl 127.0.0.1:3000/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"lora_id": "tjluyao/llama-3-8b-math", "max_new_tokens":20}}' -H 'Content-Type: application/json'
```
or using the Python client. Please refer to [README.me](clients/python/README.md).

#### Local API tests
```shell
cd server/examples && python test_local_api.py
Expand All @@ -58,10 +77,6 @@ python server/examples/test_ui.py
```
[demo.mp4](https://github.com/mlsys-io/kv.run/assets/12567967/977b09fb-bd90-4757-85ab-e5fc2a58cd93)

#### Deploy services
```shell
text-generation-launcher --model-id tjluyao/llama-3-8b
```
#### Using quantized models
Add --quantize [Method] to the command above, for example:
```shell
Expand Down

0 comments on commit 2c28c21

Please sign in to comment.