You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can then open your browser at http://127.0.0.1:3000to view the Swagger UI to send requests.
29
+
You can then open your browser at http://127.0.0.1:3000and interact with the service through Swagger UI.
42
30
43
-
### Container
31
+
### Containers
44
32
45
33
We also provide two pre-built container to run on CPU and GPU respectively.
46
34
This requires any container engine, such as docker, podman, ...
47
35
You can then quickly try out the service by running the container:
48
36
49
37
```bash
50
38
# cpu
51
-
docker run -p 3000:3000 ghcr.io/bentoml/nlp-multi-task-service:cpu
39
+
docker run -p 3000:3000 ghcr.io/bentoml/transformers-nlp-service:cpu
52
40
53
41
# gpu
54
-
docker run --gpus all -p 3000:3000 ghcr.io/bentoml/nlp-multi-task-service:gpu
42
+
docker run --gpus all -p 3000:3000 ghcr.io/bentoml/transformers-nlp-service:gpu
55
43
```
56
44
57
45
> Note that to run with GPU, you will need to have [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) setup.
58
46
47
+
### Python API
48
+
One can also use the BentoML Python API to serve their models.
59
49
60
-
## Interacting with the service
50
+
Run the following to build a Bento within the Bento Store:
51
+
```bash
52
+
bentoml build
53
+
```
54
+
Then, start a server with `bentoml.HTTPServer`:
55
+
56
+
```python
57
+
import bentoml
58
+
59
+
# Retrieve Bento from Bento Store
60
+
bento = bentoml.get("transformers-nlp-service")
61
61
62
-
### CURL
62
+
server = bentoml.HTTPServer(bento, port=3000)
63
+
server.start(blocking=True)
64
+
```
63
65
64
-
You can send requests to serivce with curl. The following example shows how to send a request to the service to summarize a text:
66
+
### gRPC?
67
+
If you wish to use gRPC, this project also include gRPC support. Run the following:
68
+
69
+
```bash
70
+
bentoml serve-grpc
71
+
```
72
+
73
+
To run the container with gRPC, do
74
+
75
+
```bash
76
+
docker run -p 3000:3000 -p 3001:3001 ghcr.io/bentoml/nlp:cpu serve-grpc
77
+
```
78
+
79
+
To find more information about gRPC with BentoML, refer to [our documentation](https://docs.bentoml.org/en/latest/guides/grpc.html)
80
+
81
+
## 🌐 Interacting with the Service 🌐
82
+
The default mode of BentoML's model serving is via HTTP server. Here, we showcase a few examples of how one can interact with the service:
83
+
### cURL
84
+
The following example shows how to send a request to the service to summarize a text via cURL:
65
85
66
86
```bash
67
87
curl -X 'POST' \
@@ -83,13 +103,7 @@ Celebrity stylist Law Roach on dressing Zendaya and '\''faking it '\''till you m
83
103
A quill strapped across her chest, Schafer let us know she is still writing her narrative — and defining herself on her own terms. There'\''s an entire story contained in those two garments. As De Saint Sernin said in the show notes: "Thirty-six looks, each one a heartfelt sentence."
84
104
The powerful ensemble may become one of Law Roach'\''s last celebrity styling credits. Roach announced over social media on Tuesday that he would be retiring from the industry after 14 years of creating conversation-driving looks for the likes of Zendaya, Bella Hadid, Anya Taylor-Joy, Ariana Grande and Megan Thee Stallion.'
85
105
```
86
-
87
-
You can also see the OpenAPI UI at http://127.0.0.1:3000
88
-
89
-

90
-
91
-
### in Python 🐍
92
-
106
+
### Via BentoClient 🐍
93
107
To send requests in Python, one can use ``bentoml.client.Client`` to send requests to the service:
94
108
95
109
```python
@@ -106,9 +120,14 @@ Run `python client.py` to see it in action.
106
120
107
121
> Checkout the [`client.py`](./client.py) file for more details.
108
122
123
+
Note that all API endpoints defined in `service.py` can be access through client through its sync and async methods. For example, the [`service.py`](./service.py) contains three endpoints: `/summarize`, `/categorize` and `/make_analysis`, and hence the following
result = client.summarize("Try to summarize this text")
178
-
```
179
-
180
-
Note that all API endpoints defined in `service.py` can be access through client through its sync and async methods. For example, the [`service.py`](./service.py) contains three endpoints: `/summarize`, `/categorize` and `/make_analysis`, and hence the following
You can add more tasks and models by editing the `download_model.py` file.
232
195
233
196
### Where can I add API logics?
234
-
235
197
Pre/post processing logics can be set in the `service.py` file.
236
198
237
-
### Where can I find more docs about Transformers and BentoML?
238
199
200
+
### Where can I find more docs about Transformers and BentoML?
239
201
BentoML supports Transformers models out of the box. You can find more details in the [BentoML support](https://docs.bentoml.org/en/latest/frameworks/transformers.html) for [Transformers](https://huggingface.co/docs/transformers/index).
240
202
203
+
## 🚀 Bringing it to Production 🚀
204
+
BentoML offers a number of options for deploying and hosting online ML services into production, learn more at [Deploying Bento Docs](https://docs.bentoml.org/en/latest/concepts/deploy.html).
241
205
242
-
## How can I deploy this to production?
243
-
244
-
We have a few options for you to deploy this service to production:
245
-
246
-
### Container deployment
247
-
248
-
If you wish to deploy this as a container, you can use the following:
0 commit comments