Skip to content

Commit 84e55ef

Browse files
vishalbolludeliahu
authored andcommitted
Add key features and architecture diagram (#2239)
(cherry picked from commit cea16f1)
1 parent f26f269 commit 84e55ef

File tree

5 files changed

+55
-11
lines changed

5 files changed

+55
-11
lines changed

Diff for: docs/overview.md

+4
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,7 @@ Cortex uses a collection of containers, referred to as a pod, as the atomic unit
3232
* Task
3333

3434
Visit the workload-specific documentation for more details.
35+
36+
## Architecture Diagram
37+
38+
![](https://user-images.githubusercontent.com/4365343/121231768-ce62e200-c85e-11eb-84b1-3d5d4b999c12.png)

Diff for: docs/workloads/async/async.md

+17-3
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,26 @@ Async APIs are designed for asynchronous workloads in which the user submits an
44

55
Async APIs are a good fit for users who want to submit longer workloads (such as video, audio or document processing), and do not need the result immediately or synchronously.
66

7+
**Key features**
8+
9+
* asynchronously process requests
10+
* retrieve status and response via HTTP endpoint
11+
* autoscale based on queue length
12+
* avoid cold starts
13+
* scale to 0
14+
* perform rolling updates
15+
* automatically recover from failures and spot instance termination
16+
717
## How it works
818

9-
When you deploy an AsyncAPI, Cortex creates an SQS queue, a pool of Async Gateway workers, and a pool of workers running your containers.
19+
When you deploy an AsyncAPI, Cortex creates an SQS queue, a pool of Async Gateway workers, and a pool of worker pods. Each worker pod is running a dequeuer sidecar and your containers.
20+
21+
Upon receiving a request, the Async Gateway will save the request payload to S3, enqueue the request ID onto an SQS FIFO queue, and respond with the request ID.
22+
23+
The dequeuer sidecar in the worker pod will pull the request from the SQS queue, download the request's payload from S3, and make a POST request to your containers. After the dequeuer receives a response, the corresponding request payload will be deleted from S3 and the response will be saved in S3 for 7 days.
1024

11-
The Async Gateway is responsible for submitting the workloads to the queue and for retrieving workload statuses and results. Cortex fully implements and manages the Async Gateway and the queue.
25+
You can fetch the result by making a GET request to the AsyncAPI endpoint with the request ID. The Async Gateway will respond with the status and the result (if the request has been completed).
1226

1327
The pool of workers running your containers autoscales based on the average number of messages in the queue and can scale down to 0 (if configured to do so).
1428

15-
![](https://user-images.githubusercontent.com/7456627/111491999-9b67f100-873c-11eb-87f0-effcf4aab01b.png)
29+
![](https://user-images.githubusercontent.com/4365343/121231833-e470a280-c85e-11eb-8be7-ad0a7cf9bce3.png)

Diff for: docs/workloads/batch/batch.md

+13-4
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,25 @@ Batch APIs run distributed and fault-tolerant batch processing jobs on demand.
44

55
Batch APIs are a good fit for users who want to break up their workloads and distribute them across a dedicated pool of workers (for example, running inference on a set of images).
66

7-
## How it works
7+
**Key features**
8+
9+
* distribute a batch job across multiple workers
10+
* scale to 0 (when there are no batch jobs)
11+
* trigger `/on-job-complete` hook once all batches have been processed
12+
* attempt all batches at least once
13+
* reroute failed batches to a dead letter queue
14+
* automatically recover from failures and spot instance termination
815

9-
When you deploy a Batch API, Cortex creates an endpoint to receive job submissions.
16+
## How it works
1017

11-
Upon job submission, Cortex responds with a Job ID, and asynchronously triggers a Batch Job.
18+
When you deploy a Batch API, Cortex creates an endpoint to receive job submissions. Upon submitting a job, Cortex will respond with a Job ID, and will asynchronously trigger a Batch Job.
1219

13-
First, Cortex deploys an enqueuer, which breaks up the data in the job into batches and pushes them onto an SQS FIFO queue.
20+
A Batch Job begins with the deployment of an enqueuer process which breaks up the data in the job into batches and pushes them onto an SQS FIFO queue.
1421

1522
After enqueuing is complete, Cortex initializes the requested number of worker pods and attaches a dequeuer sidecar to each pod. The dequeuer is responsible for retrieving batches from the queue and making an http request to your pod for each batch.
1623

1724
After the worker pods have emptied the queue, the job is marked as complete, and Cortex will terminate the worker pods and delete the SQS queue.
1825

1926
You can make GET requests to the BatchAPI endpoint to get the status of the Job and metrics such as the number of batches completed and failed.
27+
28+
![](https://user-images.githubusercontent.com/4365343/121231862-ed617400-c85e-11eb-96fb-84b10c211131.png)

Diff for: docs/workloads/realtime/realtime.md

+11
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,19 @@ Realtime APIs respond to requests synchronously and autoscale based on in-flight
44

55
Realtime APIs are a good fit for users who want to run stateless containers as a scalable microservice (for example, deploying machine learning models as APIs).
66

7+
**Key features**
8+
9+
* respond to requests synchronously
10+
* autoscale based on request volume
11+
* avoid cold starts
12+
* perform rolling updates
13+
* automatically recover from failures and spot instance termination
14+
* perform A/B tests and canary deployments
15+
716
## How it works
817

918
When you deploy a Realtime API, Cortex initializes a pool of worker pods and attaches a proxy sidecar to each of the pods.
1019

1120
The proxy is responsible for receiving incoming requests, queueing them (if necessary), and forwarding them to your pod when it is ready. Autoscaling is based on aggregate in-flight request volume, which is published by the proxy sidecars.
21+
22+
![](https://user-images.githubusercontent.com/4365343/121231921-fe11ea00-c85e-11eb-9813-6ee114f9a3fc.png)

Diff for: docs/workloads/task/task.md

+10-4
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,18 @@ Task APIs provide a lambda-style execution of containers. They are useful for ru
44

55
Task APIs are a good fit when you need to trigger container execution via an HTTP request. They can be used to run tasks (e.g. training models), and can be configured as task runners for orchestrators (such as airflow).
66

7-
## How it works
7+
**Key Features**
8+
9+
* run containers on-demand
10+
* scale to 0 (when there are no tasks)
11+
* automatically recover from failures and spot instance termination
812

9-
When you deploy a Task API, an endpoint is created to receive task submissions.
13+
## How it works
1014

11-
Upon submitting a Task, Cortex will respond with a Task ID and will asynchronously trigger the execution of a Task.
15+
When you deploy a Task API, an endpoint is created to receive task submissions. Upon submitting a Task, Cortex will respond with a Task ID and will asynchronously trigger the execution of a Task.
1216

1317
Cortex will initialize a worker pod based on your API specification. After the worker pod runs to completion, the Task is marked as completed and the pod is terminated.
1418

15-
You can make GET requests to the Task API endpoint to retreive the status of the Task.
19+
You can make GET requests to the Task API endpoint to retrieve the status of the Task.
20+
21+
![](https://user-images.githubusercontent.com/4365343/121231738-c30fb680-c85e-11eb-886f-dc4d9bf3ef17.png)

0 commit comments

Comments
 (0)