You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/deployments/realtime-api/predictors.md
+17
Original file line number
Diff line number
Diff line change
@@ -532,3 +532,20 @@ def predict(self, payload):
532
532
content=data, media_type="text/plain")
533
533
return response
534
534
```
535
+
536
+
## Chaining APIs
537
+
538
+
It is possible to make requests from one API to another within a Cortex cluster. All running APIs are accessible from within the predictor at `http://api-<api_name>:8888/predict`, where `<api_name>` is the name of the API you are making a request to.
539
+
540
+
For example, if there is an api named `text-generator` running in the cluster, you could make a request to it from a different API by using:
Note that the autoscaling configuration (i.e. `target_replica_concurrency`) for the API that is making the request should be modified with the understanding that requests will still be considered "in-flight" with the first API as the request is being fulfilled in the second API (during which it will also be considered "in-flight" with the second API). See more details in the [autoscaling docs](autoscaling.md).
0 commit comments