Deploy machine learning models to production

install • documentation • examples • community

Deploy machine learning models to production

Cortex is an open source platform for deploying, managing, and scaling machine learning in production.

Model serving infrastructure

Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
Ensures high availability with availability zones and automated instance restarts.
Runs inference on spot instances with on-demand backups.
Autoscales to handle production workloads.

Configure Cortex

# cluster.yaml

region: us-east-1
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true

Spin up Cortex on your AWS account

$ cortex cluster up --config cluster.yaml

￮ configuring autoscaling ✓
￮ configuring networking ✓
￮ configuring logging ✓

cortex is ready!

Reproducible deployments

Package dependencies, code, and configuration for reproducible deployments.
Configure compute, autoscaling, and networking for each API.
Integrate with your data science platform or CI/CD system.
Test locally before deploying to your cluster.

Implement a predictor

# predictor.py

from transformers import pipeline

class PythonPredictor:
  def __init__(self, config):
    self.model = pipeline(task="text-generation")

  def predict(self, payload):
    return self.model(payload["text"])[0]

Configure an API

api_spec = {
  "name": "text-generator",
  "kind": "RealtimeAPI",
  "predictor": {
    "type": "python",
    "path": "predictor.py"
  },
  "compute": {
    "gpu": 1,
    "mem": "8Gi",
  },
  "autoscaling": {
    "min_replicas": 1,
    "max_replicas": 10
  },
  "networking": {
    "api_gateway": "public"
  }
}

Scalable machine learning APIs

Scale to handle production workloads with request-based autoscaling.
Stream performance metrics and logs to any monitoring tool.
Serve many models efficiently with multi model caching.
Configure traffic splitting for A/B testing.
Update APIs without downtime.

Deploy to your cluster

import cortex

cx = cortex.client("aws")
cx.deploy(api_spec, project_dir=".")

# creating https://example.com/text-generator

Consume your API

import requests

endpoint = "https://example.com/text-generator"
payload = {"text": "hello world"}
prediction = requests.post(endpoint, payload)

Get started

pip install cortex

See the installation guide for next steps.

Name		Name	Last commit message	Last commit date
Latest commit History 1,654 Commits
.circleci		.circleci
.github		.github
build		build
cli		cli
dev		dev
docs		docs
examples		examples
images		images
manager		manager
pkg		pkg
.dockerignore		.dockerignore
.gitbook.yaml		.gitbook.yaml
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
get-cli.sh		get-cli.sh
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deploy machine learning models to production

Model serving infrastructure

Configure Cortex

Spin up Cortex on your AWS account

Reproducible deployments

Implement a predictor

Configure an API

Scalable machine learning APIs

Deploy to your cluster

Consume your API

Get started

About

Uh oh!

Releases

Packages

Languages

License

lapaniku/cortex

Folders and files

Latest commit

History

Repository files navigation

Deploy machine learning models to production

Model serving infrastructure

Configure Cortex

Spin up Cortex on your AWS account

Reproducible deployments

Implement a predictor

Configure an API

Scalable machine learning APIs

Deploy to your cluster

Consume your API

Get started

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages