Merge pull request #611 from typhoonzero/refine_readme

Refine readme
PaddlePaddle · Feb 24, 2018 · 735177d · 735177d
2 parents 002506f + cf3f80c
commit 735177d
Show file tree

Hide file tree

Showing 12 changed files with 289 additions and 145 deletions.
diff --git a/README.md b/README.md
@@ -2,149 +2,60 @@
 
 [![Build Status](https://travis-ci.org/PaddlePaddle/cloud.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/cloud)
 
-PaddlePaddle Cloud is a Distributed Deep-Learning Cloud Platform for both cloud
-providers and enterprises.
-
-PaddlePaddle Cloud use [Kubernetes](https://kubernetes.io) as it's backend job
-dispatching and cluster resource management center. And use [PaddlePaddle](https://github.com/PaddlePaddle/Paddle.git)
-as the deep-learning frame work. Users can use web pages or command-line tools
-to submit their deep-learning training jobs remotely to make use of power of
-large scale GPU clusters.
-
-## Using Command-line To Submit Cloud Training Jobs
-
-[中文手册](./doc/usage_cn.md)
-
-English tutorials(comming soon...)
-
-## Deploy PaddlePaddle Cloud
-
-### Pre-Requirements
-- PaddlePaddle Cloud use kubernetes as it's backend core, deploy kubernetes cluster
-  using [Sextant](https://github.com/k8sp/sextant) or any tool you like.
-
-
-### Run on minikube
-Please see [here](https://github.com/PaddlePaddle/cloud/blob/develop/doc/run_on_minikube.md)
-
-### Run on kubernetes
-- Build Paddle Cloud Docker Image
-
-  ```bash
-  # build docker image
-  git clone https://github.com/PaddlePaddle/cloud.git
-  cd cloud/paddlecloud
-  docker build -t [your_docker_registry]/pcloud .
-  # push to registry so that we can submit paddlecloud to kubernetes
-  docker push [your_docker_registry]/pcloud
-  ```
-
-- We use [volume](https://kubernetes.io/docs/concepts/storage/volumes/) to mount MySQL data,
-  cert files and settings, in `k8s/` folder we have some samples for how to mount
-  stand-alone files and settings using [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath). Here's
-  a good tutorial of creating kubernetes certs: https://coreos.com/kubernetes/docs/latest/getting-started.html
-
-  - create data folder on a Kubernetes node, such as:
-
-  ```bash
-  mkdir -p /home/pcloud/data/mysql
-  mkdir -p /home/pcloud/data/certs
-  ```
-  - Copy Kubernetes CA files (ca.pem, ca-key.pem) to `/home/pcloud/data/certs` folder
-  - Copy Kubernetes admin user key (admin.pem, admin-key.pem) to `/home/pcloud/data/certs` folder (if you don't have it on Kubernetes worker node, you'll find it on Kubernetes master node)
-  - Optianal: copy CephFS Key file(admin.secret) to `/home/pcloud/data/certs` folder
-  - Copy `paddlecloud/settings.py` file to `/home/pcloud/data` folder
-
-- Configure `cloud_deployment.yaml`
-  - `spec.template.spec.containers[0].volumes` change the `hostPath` which match your data folder.
-  - `spec.template.spec.nodeSelector.`, edit the value `kubernetes.io/hostname` to host which data folder on.You can use `kubectl get nodes` to list all the Kubernetes nodes.
-- Configure `settings.py`
-  - Add your domain name (or the paddle cloud server's hostname or ip address) to `ALLOWED_HOSTS`.
-  - Configure `DATACENTERS` to your backend storage, supports CephFS and HostPath currently.
-    You can use HostPath mode to make use of shared file-systems like "NFS".
-    If you use something like hostPath, you need to modify the DATACENTERS field in settings.py as follows:
-
-    ```
-    DATACENTERS = {
-        "<MY_DATA_CENTER_NAME_HERE>":{
-            "fstype": "hostpath",
-            "host_path": "/home/pcloud/data/public/",
-            "mount_path": "/pfs/%s/home/%s/" # mount_path % ( dc, username )
-        }
-    }
-    ```
-    
-- Configure `cloud_ingress.yaml` is your kubernetes cluster is using [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) (if you need to use Jupyter notebook, you have to configure the ingress controller)
-  to proxy HTTP traffics, or you can configure `cloud_service.yaml` to use [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport)
-  - if using ingress, configure `spec.rules[0].host` to your domain name
-- Deploy mysql on Kubernetes first if you don't have it on your cluster, and modify the mysql endpoint in settings.py
-  - `kubectl create -f ./mysql_deployment.yaml` (you need to fill in the nodeselector field with your node's hostname or ip in yaml first)
-  - `kubectl create -f ./mysql_service.yaml`
-- Deploy cloud on Kubernetes
-  - `kubectl create -f k8s/cloud_deployment.yaml`(you need to fill in the nodeselector field with your node's hostname or ip in yaml first)
-  - `kubectl create -f k8s/cloud_service.yaml`
-  - `kubectl create -f k8s/cloud_ingress.yaml`(optianal if you don't need Jupyter notebook)
-
-
-To test or visit the website, find out the kubernetes ingress IP
-addresses, or the NodePort.
-
-Then open your browser and visit `http://<ingress-ip-address>`, or
-`http://<any-node-ip-address>:<NodePort>`
-
-- Prepare public dataset
-
-  You can create a Kubernetes Job for preparing the public cloud dataset with RecordIO files. You should modify the YAML file as your environment:
-  - `<DATACENTER>`, Your cluster datacenter 
-  - `<MONITOR_ADDR>`, Ceph monitor address
-  ```bash
-  kubectl create -f k8s/prepare_dataset.yaml
-  ```
-
-### Run locally without docker
-
-- You still need a kubernetes cluster when try running locally.
-- Make sure you have `Python > 2.7.10` installed.
-- Python needs to support `OPENSSL 1.2`. To check it out, simply run:
-    ```python
-       >>> import ssl
-       >>> ssl.OPENSSL_VERSION
-       'OpenSSL 1.0.2k  26 Jan 2017'
-    ```
-- Make sure you are using a virtual environment of some sort (e.g. `virtualenv` or
-`pyenv`).
-
-```
-virtualenv paddlecloudenv
-# enable the virtualenv
-source paddlecloudenv/bin/activate
-```
-
-To run for the first time, you need to:
-
-```
-cd paddlecloud
-npm install
-pip install -r requirements.txt
-./manage.py migrate
-./manage.py loaddata sites
-npm run dev
-```
-
-Browse to http://localhost:8000/
-
-If you are starting the server for the second time, just run:
-```
-./manage.py runserver
+PaddlePaddle Cloud is a combination of PaddlePaddle and Kubernetes. It
+supports fault-recoverable and fault-tolerant large-scaled distributed
+deep learning. We can deploy it on public cloud and on-premise
+clusters.
+
+PaddlePaddle Cloud includes the following components:
+
+- paddlectl: A command-line tool that talks to paddlecloud and
+  paddle-fs.
+- paddlecloud: An HTTP server that exposes Kubernetes as a Web
+  service.
+- paddle-fs: An HTTP server that exposes the CephFS distributed
+  filesystem as a Web service.
+- EDL (elastic deep learning): A Kubernetes controller that supports
+  elastic scheduling of deep learning jobs and other jobs.
+- Fault-tolerant distributed deep learning: This part is in
+  the [Paddle](https://github.com/PaddlePaddle/paddle) repo.
+
+## Tutorials
+
+- [快速开始](./doc/tutorial_cn.md)
+- [中文手册](./doc/usage_cn.md)
+
+
+## How To
+
+- [Build PaddlePaddle Cloud](./doc/howto/build.md)
+- [Deploy PaddlePaddle Cloud](./doc/howto/deploy.md)
+- [Elastic Deep Learning using EDL](./doc/howto/edl.md)
+- [PaddlePaddle Cloud on Minikube](./doc/howto/minikube.md)
+
+## Directory structure
+
+```
+.
+├── demo: distributed version of https://github.com/PaddlePaddle/book programs
+├── doc: documents
+├── docker: scripts to build Docker image to run PaddlePaddle distributed
+├── go
+│   ├── cmd
+│   │   ├── edl: entry of EDL controller binary
+│   │   ├── paddlecloud: the command line client of PaddlePaddle Cloud (will be deprecated)
+│   │   ├── paddlectl: the command line client of PaddlePaddle Cloud
+│   │   └── pfsserver: entry of PaddleFS binary
+│   ├── edl: EDL implementation
+│   ├── filemanager: PaddleFS implementation
+│   ├── paddlecloud: command line client implement (will be deprecated)
+│   ├── paddlectl: command line client implement
+│   ├── scripts: scripts for Go code generation
+├── k8s: YAML files to create different components of PaddlePaddle Cloud
+│   ├── edl: TPR definition and EDL controller for TraningJob resource
+│   │   ├── autoscale_job: A sample TrainingJob that can scale
+│   │   └── autoscale_load: A sample cluster job demonstrating a common workload
+│   ├── minikube: YAML files to deploy on local mini-kube environment
+│   └── raw_job: A demo job demonstrates how to run PaddlePaddle jobs in cluster
+└── python: PaddlePaddle Cloud REST API server
 ```
-
-### Configure Email Sending
-If you want to use `mail` command to send confirmation emails, change the below settings:
-
-```
-EMAIL_BACKEND = 'django_sendmail_backend.backends.EmailBackend'
-```
-
-You may need to use `hostNetwork` for your pod when using mail command.
-
-Or you can use django smtp bindings just refer to https://docs.djangoproject.com/en/1.11/topics/email/
diff --git a/doc/build.md b/doc/build.md
@@ -0,0 +1,87 @@
+# Build Components
+
+This article contains instructions of build all the components
+of PaddlePaddle Cloud and how to pack them into Docker images
+so that server-side components can run in the Kubernetes cluster.
+
+- Server-side components:
+  - Cloud Server (written in Python, only need to pack to image)
+  - EDL Controller
+  - PaddleFS (PFS) Server
+  - PaddlePaddle Cloud Job runtime Docker image
+- Client side component:
+  - Command line client
+
+Before starting, you have to setup [Go development environment](https://golang.org/doc/install#install) and install
+[glide](https://github.com/Masterminds/glide).
+
+## Build EDL Controller
+
+Run the following commands to finish the build:
+
+```bash
+cd go
+glide install --strip-vendor
+cd go/cmd/edl
+go build
+```
+
+The above step will generate a binary file named `edl` which should
+run as a daemon process on the Kubernetes cluster.
+
+
+## Build paddlectl client
+
+Run the following command to build paddlectl binary.
+
+```bash
+cd go/cmd/paddlectl
+go build
+```
+
+Then file `paddlectl` will be generated under the current directory.
+
+
+# Build Docker Images for Server side
+
+## EDL Controller Image
+
+After you've built edl binary, run the following command to build the
+corresponding Docker image.
+
+```bash
+cd go/cmd/edl
+docker build -t [your image tag] .
+```
+
+## Cloud Server Image
+
+This image is used to start the Cloud Server in Kubernetes cluster. To
+build, just run:
+
+```bash
+cd python/paddlecloud
+docker build -t [your image tag] .
+```
+
+## PaddleFS (PFS) Server Image
+
+To build PaddleFS image, just run:
+
+```bash
+cd docker/pfs
+sh build.sh
+```
+
+## Cloud Job runtime Docker image
+
+To build job runtime image which do the actual cloud job running, run:
+
+```bash
+cd docker
+sh build_docker.sh [base paddlepaddle image] [target image]
+```
+
+- base paddlepaddle image is PaddlePaddle docker runtime image, like
+  paddlepaddle/paddle:latest-gpu
+- target image is the cloud job image name you want to build.