forked from mosecorg/mosec
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: support multi-route with shared workers (mosecorg#423)
* fix dockerfile path Signed-off-by: Keming <kemingy94@gmail.com> * refactor py args to rs Signed-off-by: Keming <kemingy94@gmail.com> * finish the basic multiroute rust part Signed-off-by: Keming <kemingy94@gmail.com> * fix the openapi Signed-off-by: Keming <kemingy94@gmail.com> * add protocol state Signed-off-by: Keming <kemingy94@gmail.com> * add runtime register Signed-off-by: Keming <kemingy94@gmail.com> * Apply suggestions from code review Co-authored-by: zclzc <38581401+lkevinzc@users.noreply.github.com> Signed-off-by: Keming <kemingy94@gmail.com> * fix a deadlock Signed-off-by: Keming <kemingy94@gmail.com> * add test Signed-off-by: Keming <kemingy94@gmail.com> * add doc Signed-off-by: Keming <kemingy94@gmail.com> * bump version Signed-off-by: Keming <kemingy94@gmail.com> * combine ingress & egress to state enum Signed-off-by: Keming <kemingy94@gmail.com> --------- Signed-off-by: Keming <kemingy94@gmail.com> Co-authored-by: zclzc <38581401+lkevinzc@users.noreply.github.com>
- Loading branch information
Showing
40 changed files
with
1,110 additions
and
997 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Multi-Route | ||
|
||
This example shows how to use the multi-route feature. | ||
|
||
You will need this feature if you want to: | ||
|
||
- Serve multiple models in one service on different endpoints. | ||
- i.e. register `/embedding` & `/classify` with different models | ||
- Serve one model to multiple different endpoints in one service. | ||
- i.e. register LLaMA with `/inference` and `/v1/chat/completions` to make it compatible with the OpenAI API | ||
- Share a worker in different routes | ||
- The shared worker will collect the dynamic batch from multiple previous stages. | ||
- If you want to have multiple runtimes with sharing, you can declare multiple runtime instances with the same worker class. | ||
|
||
The worker definition part is the same as for a single route. The only difference is how you register the worker with the server. | ||
|
||
Here we expose a new [concept](../reference/concept.md) called [`Runtime`](mosec.runtime.Runtime). | ||
|
||
You can create the `Runtime` and register on the server with a `{endpoint: [Runtime]}` dictionary. | ||
|
||
See the complete demo code below. | ||
|
||
## Server | ||
|
||
```{include} ../../../examples/multi_route/server.py | ||
:code: python | ||
``` | ||
|
||
## Client | ||
|
||
```{include} ../../../examples/multi_route/client.py | ||
:code: python | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Copyright 2023 MOSEC Authors | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import json | ||
from http import HTTPStatus | ||
|
||
import httpx | ||
import msgpack # type: ignore | ||
|
||
typed_req = { | ||
"bin": b"hello mosec with type check", | ||
"name": "type check", | ||
} | ||
|
||
print(">> requesting for the typed route with msgpack serde") | ||
resp = httpx.post( | ||
"http://127.0.0.1:8000/v1/inference", content=msgpack.packb(typed_req) | ||
) | ||
if resp.status_code == HTTPStatus.OK: | ||
print(f"OK: {msgpack.unpackb(resp.content)}") | ||
else: | ||
print(f"err[{resp.status_code}] {resp.text}") | ||
|
||
print(">> requesting for the untyped route with json serde") | ||
resp = httpx.post("http://127.0.0.1:8000/inference", content=b"hello mosec") | ||
if resp.status_code == HTTPStatus.OK: | ||
print(f"OK: {json.loads(resp.content)}") | ||
else: | ||
print(f"err[{resp.status_code}] {resp.text}") |
Oops, something went wrong.