Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Update docs #29

Merged
merged 1 commit into from
Sep 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 35 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,114 +1,63 @@
# elastic-jupyter-operator: Elastic Jupyter on Kubernetes
# elastic-jupyter-operator

Kubernetes 原生的弹性 Jupyter 即服务
Elastic Jupyter Notebooks on Kubernetes

## 介绍
## Motivation

为用户**按需**提供弹性的 Jupyter Notebook 服务。elastic-jupyter-operator 提供以下特性:
Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text, and multimedia resources in a single document.

- GPU 空闲时自动释放资源到 Kubernetes 集群
- 资源延迟申请,在使用时按需申请对应 CPU/内存/GPU 资源
- 多 Jupyter 共享资源池,提高资源利用率
For data scientists and machine learning engineers, Jupyter has emerged as a de facto standard. At the same time, there has been growing criticism that the way notebooks are being used leads to low resource utilization.

<p align="center"><img src="docs/images/overview.png" width="300"></p>
GPU and other hardware resources will be bound to the specified notebooks even if the data scientists do not need them currently. This project proposes some Kubernetes CRDs to solve these problems.

## 部署
## Introduction

```bash
kubectl apply -f ./hack/enterprise_gateway/prepare.yaml
make deploy
```

## 架构

elastic-jupyter-operator 的架构如图所示,`JupyterGateway` 和 `JupyterNotebook` 是两个 CRD。其中 Notebook 是 Jupyter Notebook 的前端服务,负责面向用户提供用户界面,并且与后端服务通过 HTTPS 和 Websocket 进行通信,处理用户的计算请求。

Gateway 是对应的后端服务。它负责处理来自 Notebook CR 的请求,通过调用 Kubernetes 的 API 按需创建出真正负责处理用户计算任务的 Kernel。

<p align="center"><img src="docs/images/arch.png" width="500"></p>

### 多用户

多个用户可以创建多个 Notebook CR,连接同一个 Gateway CR,以起到复用资源的作用。Gateway 会根据不同 Jupyter Notebook 的请求按需创建出对应的 Kernel。在某个用户暂时不进行任务时,会将对应 Kernel Pod 回收删除,释放资源。

<p align="center"><img src="docs/images/multiuser.png" width="300"></p>

## 对比

elastic-jupyter-operator 部分参考了 Kubeflow jupyter-controller 和 Jupyter enterprise gateway 的设计,这里介绍了与两者的不同。
elastic-jupyter-operator provides elastic Jupyter notebook services with these features:

### 与 Kubeflow Jupyter Controller 的比较
- Provide users the out-of-box Jupyter notebooks on Kubernetes.
- Autoscale Jupyter kernels when the kernels are not used within the given time frame to increase the resource utilization.
- Customize the kernel configuration in runtime without restarting the notebook.

Kubeflow Jupyter Controller 也提供了在 Kubernetes 上部署 Jupyter Notebook 的解决方案。相比于 elastic-jupyter-operator,其最大的问题在于它在一个 Pod 中运行了 Notebook 前端与负责计算的 Kernel,导致无法在 Kernel 空闲时回收资源。
<p align="center"><img src="docs/images/elastic.png" width="300"></p>
Figure 1. elastic-jupyter-operator

<p align="center"><img src="docs/images/kubeflow.png" width="300"></p>
<p align="center"><img src="docs/images/jupyter.png" width="300"></p>
Figure 2. Other Jupyter on Kubernetes solutions

### 与 Jupyter Enterprise Gateway 的比较

Jupyter Enterprise Gateway 提供了弹性 Jupyter 服务的基础,elastic-jupyter-operator 也是基于它来设计和实现的。与 Jupyter Enterprise Gateway 相比,elastic-jupyter-operator 提供了云原生的实现。Jupyter Enterprise Gateway 需要用户自行部署和维护 Gateway 和 Notebook,而 elastic-jupyter-operator 则简化了在 Kubernetes 上运维的复杂度。

## 使用

首先,创建一个 Jupyter Gateway CR:
## Deploy

```bash
kubectl apply -f ./config/samples/kubeflow.tkestack.io_v1alpha1_jupytergateway.yaml
```

```yaml
apiVersion: kubeflow.tkestack.io/v1alpha1
kind: JupyterGateway
metadata:
name: jupytergateway-sample
spec:
cullIdleTimeout: 3600
kubectl apply -f ./hack/enterprise_gateway/prepare.yaml
make deploy
```

其中 `cullIdleTimeout` 是一个配置项,在 Kernel 空闲指定 `cullIdleTimeout` 秒内,会由 Gateway 回收对应 Kernel 以释放资源。
## Quick start

其次需要创建一个 Jupyter Notebook CR 实例,并且指定对应的 Gateway CR:
You can follow the [quick start](./docs/quick-start.md) to create the notebook server and kernel in Kubernetes like this:

```bash
kubectl apply -f ./config/samples/kubeflow.tkestack.io_v1alpha1_jupyternotebook.yaml
NAME READY STATUS RESTARTS AGE
jovyan-fd191444-b08c-4668-ba4e-3748a54a0ac1-5789574d66-tb5cm 1/1 Running 0 146m
jupytergateway-sample-858bbc8d5c-xds44 1/1 Running 0 3h46m
jupyternotebook-sample-5bf7d9d9fb-pdv9b 1/1 Running 10 77d
```

```yaml
apiVersion: kubeflow.tkestack.io/v1alpha1
kind: JupyterNotebook
metadata:
name: jupyternotebook-sample
spec:
gateway:
name: jupytergateway-sample
namespace: default
```

集群上所有资源如下所示:
There are three pods running in the demo:

```
NAME READY STATUS RESTARTS AGE
pod/jupytergateway-sample-6d5d97949c-p8bj6 1/1 Running 2 11d
pod/jupyternotebook-sample-5bf7d9d9fb-nq9b8 1/1 Running 2 11d
- `jupyternotebook-sample-5bf7d9d9fb-pdv9b` is the notebook server
- `jupytergateway-sample-858bbc8d5c-xds44` is the jupyter gatrway to support remote kernels
- `jovyan-fd191444-b08c-4668-ba4e-3748a54a0ac1-5789574d66-tb5cm` is the remote kernel

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jupytergateway-sample ClusterIP 10.96.138.111 <none> 8888/TCP 11d
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 31d
The kernel will be deleted if the notebook does not use it in 10 mins. And it will be recreated if there is any new run in the notebook.

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jupytergateway-sample 1/1 1 1 11d
deployment.apps/jupyternotebook-sample 1/1 1 1 11d
## Design

NAME DESIRED CURRENT READY AGE
replicaset.apps/jupytergateway-sample-6d5d97949c 1 1 1 11d
replicaset.apps/jupyternotebook-sample-5bf7d9d9fb 1 1 1 11d
```
Please refer to [design doc](docs/design.md)

随后可以通过 NodePort、`kubectl port-forward`、ingress 等方式将 Notebook CR 对外暴露提供服务,这里以 `kubectl port-forward` 为例:
## API Documentation

```
kubectl port-forward jupyternotebook-sample-5bf7d9d9fb-nq9b8 8888
```
Please refer to [API doc](docs/api/generated.asciidoc)

## API 文档
## Special Thanks

请见 [API 文档](docs/api/generated.asciidoc)
- [jupyter/enterprise_gateway](https://github.com/jupyter/enterprise_gateway) which implements the logic to run kernels remotely
13 changes: 13 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
There are 5 CRDs defined in elastic-jupyter-operator:

- jupytergateways.kubeflow.tkestack.io
- jupyterkernels.kubeflow.tkestack.io
- jupyterkernelspecs.kubeflow.tkestack.io
- jupyterkerneltemplates.kubeflow.tkestack.io
- jupyternotebooks.kubeflow.tkestack.io

elastic-jupyter-operator 的架构如图所示,`JupyterGateway` 和 `JupyterNotebook` 是两个 CRD。其中 Notebook 是 Jupyter Notebook 的前端服务,负责面向用户提供用户界面,并且与后端服务通过 HTTPS 和 Websocket 进行通信,处理用户的计算请求。

Gateway 是对应的后端服务。它负责处理来自 Notebook CR 的请求,通过调用 Kubernetes 的 API 按需创建出真正负责处理用户计算任务的 Kernel。

<p align="center"><img src="docs/images/arch.png" width="500"></p>
Binary file added docs/images/elastic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/jupyter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 61 additions & 0 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## 使用

首先,创建一个 Jupyter Gateway CR:

```bash
kubectl apply -f ./config/samples/kubeflow.tkestack.io_v1alpha1_jupytergateway.yaml
```

```yaml
apiVersion: kubeflow.tkestack.io/v1alpha1
kind: JupyterGateway
metadata:
name: jupytergateway-sample
spec:
cullIdleTimeout: 3600
```

其中 `cullIdleTimeout` 是一个配置项,在 Kernel 空闲指定 `cullIdleTimeout` 秒内,会由 Gateway 回收对应 Kernel 以释放资源。

其次需要创建一个 Jupyter Notebook CR 实例,并且指定对应的 Gateway CR:

```bash
kubectl apply -f ./config/samples/kubeflow.tkestack.io_v1alpha1_jupyternotebook.yaml
```

```yaml
apiVersion: kubeflow.tkestack.io/v1alpha1
kind: JupyterNotebook
metadata:
name: jupyternotebook-sample
spec:
gateway:
name: jupytergateway-sample
namespace: default
```

集群上所有资源如下所示:

```
NAME READY STATUS RESTARTS AGE
pod/jupytergateway-sample-6d5d97949c-p8bj6 1/1 Running 2 11d
pod/jupyternotebook-sample-5bf7d9d9fb-nq9b8 1/1 Running 2 11d

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jupytergateway-sample ClusterIP 10.96.138.111 <none> 8888/TCP 11d
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 31d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jupytergateway-sample 1/1 1 1 11d
deployment.apps/jupyternotebook-sample 1/1 1 1 11d

NAME DESIRED CURRENT READY AGE
replicaset.apps/jupytergateway-sample-6d5d97949c 1 1 1 11d
replicaset.apps/jupyternotebook-sample-5bf7d9d9fb 1 1 1 11d
```

随后可以通过 NodePort、`kubectl port-forward`、ingress 等方式将 Notebook CR 对外暴露提供服务,这里以 `kubectl port-forward` 为例:

```
kubectl port-forward jupyternotebook-sample-5bf7d9d9fb-nq9b8 8888
```