Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: use collapsible readme #413

Merged
merged 2 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ dev:
cargo build
@mkdir -p mosec/bin
@cp ./target/debug/mosec mosec/bin/
pip install -e .
pip install -e .[dev,doc,mixin]

test: dev
echo "Running tests for the main logic"
Expand Down
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ pip install --upgrade diffusers[torch] transformers

### Write the server

<details>
<summary>Click me for server codes with explanations.</summary>

Firstly, we import the libraries and set up a basic logger to better observe what happens.

```python
Expand Down Expand Up @@ -127,9 +130,13 @@ if __name__ == "__main__":
server.append_worker(StableDiffusion, num=1, max_batch_size=4, max_wait_time=10)
server.run()
```
</details>

### Run the server

<details>
<summary>Click me to see how to run and query the server.</summary>

The above snippets are merged in our example file. You may directly run at the project root level. We first have a look at the _command line arguments_ (explanations [here](https://mosecorg.github.io/mosec/reference/arguments.html)):

```shell
Expand Down Expand Up @@ -157,6 +164,7 @@ curl http://127.0.0.1:8000/metrics
```

That's it! You have just hosted your **_stable-diffusion model_** as a service! 😉
</details>

## Examples

Expand All @@ -179,8 +187,8 @@ More ready-to-use examples can be found in the [Example](https://mosecorg.github
- `max_batch_size` and `max_wait_time (millisecond)` are configured when you call `append_worker`.
- Make sure inference with the `max_batch_size` value won't cause the out-of-memory in GPU.
- Normally, `max_wait_time` should be less than the batch inference time.
- If enabled, it will collect a batch either when it reaches either `max_batch_size` or the `max_wait_time`. The service will only benefit from this feature when traffic is high.
- Check the [arguments doc](https://mosecorg.github.io/mosec/reference/arguments.html).
- If enabled, it will collect a batch either when the number of accumulated requests reaches `max_batch_size` or when `max_wait_time` has elapsed. The service will benefit from this feature when the traffic is high.
- Check the [arguments doc](https://mosecorg.github.io/mosec/reference/arguments.html) for other configurations.

## Deployment

Expand All @@ -198,7 +206,7 @@ More ready-to-use examples can be found in the [Example](https://mosecorg.github
## Performance tuning

- Find out the best `max_batch_size` and `max_wait_time` for your inference service. The metrics will show the histograms of the real batch size and batch duration. Those are the key information to adjust these two parameters.
- Try to split the whole inference process into separate CPU and GPU stages (ref [DistilBERT](https://mosecorg.github.io/mosec/examples/pytorch.html#natural-language-processing)). Different stages will be run in a [data pipeline](https://en.wikipedia.org/wiki/Pipeline_(software)), which will keep the GPU busy.
- Try to split the whole inference process into separate CPU and GPU stages (ref [DistilBERT](https://mosecorg.github.io/mosec/examples/pytorch.html#natural-language-processing)). Different stages will be run in a [data pipeline](https://en.wikipedia.org/wiki/Pipeline_(software)), which will keep the GPU busy.
- You can also adjust the number of workers in each stage. For example, if your pipeline consists of a CPU stage for preprocessing and a GPU stage for model inference, increasing the number of CPU-stage workers can help to produce more data to be batched for model inference at the GPU stage; increasing the GPU-stage workers can fully utilize the GPU memory and computation power. Both ways may contribute to higher GPU utilization, which consequently results in higher service throughput.
- For multi-stage services, note that the data passing through different stages will be serialized/deserialized by the `serialize_ipc/deserialize_ipc` methods, so extremely large data might make the whole pipeline slow. The serialized data is passed to the next stage through rust by default, you could enable shared memory to potentially reduce the latency (ref [RedisShmIPCMixin](https://mosecorg.github.io/mosec/examples/ipc.html#redis-shm-ipc-py)).
- You should choose appropriate `serialize/deserialize` methods, which are used to decode the user request and encode the response. By default, both are using JSON. However, images and embeddings are not well supported by JSON. You can choose msgpack which is faster and binary compatible (ref [Stable Diffusion](https://mosecorg.github.io/mosec/examples/stable_diffusion.html)).
Expand Down
8 changes: 4 additions & 4 deletions examples/stable_diffusion/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,17 @@ def __init__(self):
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
)
device = "cuda" if torch.cuda.is_available() else "cpu"
self.pipe = self.pipe.to(device)
self.pipe = self.pipe.to(device) # type: ignore
self.example = ["useless example prompt"] * 4 # warmup (bs=4)

def forward(self, data: List[str]) -> List[memoryview]:
logger.debug("generate images for %s", data)
res = self.pipe(data)
res = self.pipe(data) # type: ignore
logger.debug("NSFW: %s", res[1])
images = []
for img in res[0]:
for img in res[0]: # type: ignore
dummy_file = BytesIO()
img.save(dummy_file, format="JPEG")
img.save(dummy_file, format="JPEG") # type: ignore
images.append(dummy_file.getbuffer())
return images

Expand Down
2 changes: 1 addition & 1 deletion requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ setuptools_scm>=7
pytest>=6
pytest-mock>=3.5
mypy>=0.910
pyright>=1.1.290,<=1.1.308
pyright>=1.1.290
pylint>=2.13.8
pydocstyle>=6.1.1
black>=20.8.0
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,5 @@ def build_extension(self, ext: Extension):
},
zip_safe=False,
ext_modules=ext_modules, # type: ignore
cmdclass={"build_ext": RustBuildExt},
cmdclass={"build_ext": RustBuildExt}, # type: ignore
)