Fail to run server with prefix-caching option

### System Info

- ghcr.io/predibase/lorax:a8ca5cb 
- Ubuntu 20.04
- GPU A10G

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

```
docker run --gpus 1 -v ./data:/data -p 8005:80 ghcr.io/predibase/lorax:a8ca5cb \
  --prefix-caching true \
  --port 80 \
  --model-id Open-Orca/Mistral-7B-OpenOrca \
  --cuda-memory-fraction 0.8 \
  --sharded false \
  --max-waiting-tokens 20 \
  --max-input-length 4096 \
  --max-total-tokens 8192 \
  --hostname 0.0.0.0 \
  --max-concurrent-requests 512 \
  --max-best-of 1  \
  --max-batch-prefill-tokens 4096 \
  --max-active-adapters 10 \
  --adapter-source local \
  --adapter-cycle-time-s 2 \
  --json-output \
  --disable-custom-kernels \
  --dtype float16
```

### Expected behavior

The server starts successfully and the prefix-caching works well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to run server with prefix-caching option #599

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to run server with prefix-caching option #599

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions