Skip to content

Commit

Permalink
Merge pull request #611 from typhoonzero/refine_readme
Browse files Browse the repository at this point in the history
Refine readme
  • Loading branch information
typhoonzero authored Feb 24, 2018
2 parents 002506f + cf3f80c commit 735177d
Show file tree
Hide file tree
Showing 12 changed files with 289 additions and 145 deletions.
201 changes: 56 additions & 145 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,149 +2,60 @@

[![Build Status](https://travis-ci.org/PaddlePaddle/cloud.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/cloud)

PaddlePaddle Cloud is a Distributed Deep-Learning Cloud Platform for both cloud
providers and enterprises.

PaddlePaddle Cloud use [Kubernetes](https://kubernetes.io) as it's backend job
dispatching and cluster resource management center. And use [PaddlePaddle](https://github.com/PaddlePaddle/Paddle.git)
as the deep-learning frame work. Users can use web pages or command-line tools
to submit their deep-learning training jobs remotely to make use of power of
large scale GPU clusters.

## Using Command-line To Submit Cloud Training Jobs

[中文手册](./doc/usage_cn.md)

English tutorials(comming soon...)

## Deploy PaddlePaddle Cloud

### Pre-Requirements
- PaddlePaddle Cloud use kubernetes as it's backend core, deploy kubernetes cluster
using [Sextant](https://github.com/k8sp/sextant) or any tool you like.


### Run on minikube
Please see [here](https://github.com/PaddlePaddle/cloud/blob/develop/doc/run_on_minikube.md)

### Run on kubernetes
- Build Paddle Cloud Docker Image

```bash
# build docker image
git clone https://github.com/PaddlePaddle/cloud.git
cd cloud/paddlecloud
docker build -t [your_docker_registry]/pcloud .
# push to registry so that we can submit paddlecloud to kubernetes
docker push [your_docker_registry]/pcloud
```

- We use [volume](https://kubernetes.io/docs/concepts/storage/volumes/) to mount MySQL data,
cert files and settings, in `k8s/` folder we have some samples for how to mount
stand-alone files and settings using [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath). Here's
a good tutorial of creating kubernetes certs: https://coreos.com/kubernetes/docs/latest/getting-started.html

- create data folder on a Kubernetes node, such as:

```bash
mkdir -p /home/pcloud/data/mysql
mkdir -p /home/pcloud/data/certs
```
- Copy Kubernetes CA files (ca.pem, ca-key.pem) to `/home/pcloud/data/certs` folder
- Copy Kubernetes admin user key (admin.pem, admin-key.pem) to `/home/pcloud/data/certs` folder (if you don't have it on Kubernetes worker node, you'll find it on Kubernetes master node)
- Optianal: copy CephFS Key file(admin.secret) to `/home/pcloud/data/certs` folder
- Copy `paddlecloud/settings.py` file to `/home/pcloud/data` folder

- Configure `cloud_deployment.yaml`
- `spec.template.spec.containers[0].volumes` change the `hostPath` which match your data folder.
- `spec.template.spec.nodeSelector.`, edit the value `kubernetes.io/hostname` to host which data folder on.You can use `kubectl get nodes` to list all the Kubernetes nodes.
- Configure `settings.py`
- Add your domain name (or the paddle cloud server's hostname or ip address) to `ALLOWED_HOSTS`.
- Configure `DATACENTERS` to your backend storage, supports CephFS and HostPath currently.
You can use HostPath mode to make use of shared file-systems like "NFS".
If you use something like hostPath, you need to modify the DATACENTERS field in settings.py as follows:

```
DATACENTERS = {
"<MY_DATA_CENTER_NAME_HERE>":{
"fstype": "hostpath",
"host_path": "/home/pcloud/data/public/",
"mount_path": "/pfs/%s/home/%s/" # mount_path % ( dc, username )
}
}
```
- Configure `cloud_ingress.yaml` is your kubernetes cluster is using [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) (if you need to use Jupyter notebook, you have to configure the ingress controller)
to proxy HTTP traffics, or you can configure `cloud_service.yaml` to use [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport)
- if using ingress, configure `spec.rules[0].host` to your domain name
- Deploy mysql on Kubernetes first if you don't have it on your cluster, and modify the mysql endpoint in settings.py
- `kubectl create -f ./mysql_deployment.yaml` (you need to fill in the nodeselector field with your node's hostname or ip in yaml first)
- `kubectl create -f ./mysql_service.yaml`
- Deploy cloud on Kubernetes
- `kubectl create -f k8s/cloud_deployment.yaml`(you need to fill in the nodeselector field with your node's hostname or ip in yaml first)
- `kubectl create -f k8s/cloud_service.yaml`
- `kubectl create -f k8s/cloud_ingress.yaml`(optianal if you don't need Jupyter notebook)
To test or visit the website, find out the kubernetes ingress IP
addresses, or the NodePort.
Then open your browser and visit `http://<ingress-ip-address>`, or
`http://<any-node-ip-address>:<NodePort>`
- Prepare public dataset
You can create a Kubernetes Job for preparing the public cloud dataset with RecordIO files. You should modify the YAML file as your environment:
- `<DATACENTER>`, Your cluster datacenter
- `<MONITOR_ADDR>`, Ceph monitor address
```bash
kubectl create -f k8s/prepare_dataset.yaml
```

### Run locally without docker

- You still need a kubernetes cluster when try running locally.
- Make sure you have `Python > 2.7.10` installed.
- Python needs to support `OPENSSL 1.2`. To check it out, simply run:
```python
>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.0.2k 26 Jan 2017'
```
- Make sure you are using a virtual environment of some sort (e.g. `virtualenv` or
`pyenv`).

```
virtualenv paddlecloudenv
# enable the virtualenv
source paddlecloudenv/bin/activate
```
To run for the first time, you need to:
```
cd paddlecloud
npm install
pip install -r requirements.txt
./manage.py migrate
./manage.py loaddata sites
npm run dev
```
Browse to http://localhost:8000/
If you are starting the server for the second time, just run:
```
./manage.py runserver
PaddlePaddle Cloud is a combination of PaddlePaddle and Kubernetes. It
supports fault-recoverable and fault-tolerant large-scaled distributed
deep learning. We can deploy it on public cloud and on-premise
clusters.

PaddlePaddle Cloud includes the following components:

- paddlectl: A command-line tool that talks to paddlecloud and
paddle-fs.
- paddlecloud: An HTTP server that exposes Kubernetes as a Web
service.
- paddle-fs: An HTTP server that exposes the CephFS distributed
filesystem as a Web service.
- EDL (elastic deep learning): A Kubernetes controller that supports
elastic scheduling of deep learning jobs and other jobs.
- Fault-tolerant distributed deep learning: This part is in
the [Paddle](https://github.com/PaddlePaddle/paddle) repo.

## Tutorials

- [快速开始](./doc/tutorial_cn.md)
- [中文手册](./doc/usage_cn.md)


## How To

- [Build PaddlePaddle Cloud](./doc/howto/build.md)
- [Deploy PaddlePaddle Cloud](./doc/howto/deploy.md)
- [Elastic Deep Learning using EDL](./doc/howto/edl.md)
- [PaddlePaddle Cloud on Minikube](./doc/howto/minikube.md)

## Directory structure

```
.
├── demo: distributed version of https://github.com/PaddlePaddle/book programs
├── doc: documents
├── docker: scripts to build Docker image to run PaddlePaddle distributed
├── go
│   ├── cmd
│   │   ├── edl: entry of EDL controller binary
│   │   ├── paddlecloud: the command line client of PaddlePaddle Cloud (will be deprecated)
│   │   ├── paddlectl: the command line client of PaddlePaddle Cloud
│   │   └── pfsserver: entry of PaddleFS binary
│   ├── edl: EDL implementation
│   ├── filemanager: PaddleFS implementation
│   ├── paddlecloud: command line client implement (will be deprecated)
│   ├── paddlectl: command line client implement
│   ├── scripts: scripts for Go code generation
├── k8s: YAML files to create different components of PaddlePaddle Cloud
│   ├── edl: TPR definition and EDL controller for TraningJob resource
│   │   ├── autoscale_job: A sample TrainingJob that can scale
│   │   └── autoscale_load: A sample cluster job demonstrating a common workload
│   ├── minikube: YAML files to deploy on local mini-kube environment
│   └── raw_job: A demo job demonstrates how to run PaddlePaddle jobs in cluster
└── python: PaddlePaddle Cloud REST API server
```
### Configure Email Sending
If you want to use `mail` command to send confirmation emails, change the below settings:
```
EMAIL_BACKEND = 'django_sendmail_backend.backends.EmailBackend'
```
You may need to use `hostNetwork` for your pod when using mail command.
Or you can use django smtp bindings just refer to https://docs.djangoproject.com/en/1.11/topics/email/
87 changes: 87 additions & 0 deletions doc/build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Build Components

This article contains instructions of build all the components
of PaddlePaddle Cloud and how to pack them into Docker images
so that server-side components can run in the Kubernetes cluster.

- Server-side components:
- Cloud Server (written in Python, only need to pack to image)
- EDL Controller
- PaddleFS (PFS) Server
- PaddlePaddle Cloud Job runtime Docker image
- Client side component:
- Command line client

Before starting, you have to setup [Go development environment](https://golang.org/doc/install#install) and install
[glide](https://github.com/Masterminds/glide).

## Build EDL Controller

Run the following commands to finish the build:

```bash
cd go
glide install --strip-vendor
cd go/cmd/edl
go build
```

The above step will generate a binary file named `edl` which should
run as a daemon process on the Kubernetes cluster.


## Build paddlectl client

Run the following command to build paddlectl binary.

```bash
cd go/cmd/paddlectl
go build
```

Then file `paddlectl` will be generated under the current directory.


# Build Docker Images for Server side

## EDL Controller Image

After you've built edl binary, run the following command to build the
corresponding Docker image.

```bash
cd go/cmd/edl
docker build -t [your image tag] .
```

## Cloud Server Image

This image is used to start the Cloud Server in Kubernetes cluster. To
build, just run:

```bash
cd python/paddlecloud
docker build -t [your image tag] .
```

## PaddleFS (PFS) Server Image

To build PaddleFS image, just run:

```bash
cd docker/pfs
sh build.sh
```

## Cloud Job runtime Docker image

To build job runtime image which do the actual cloud job running, run:

```bash
cd docker
sh build_docker.sh [base paddlepaddle image] [target image]
```

- base paddlepaddle image is PaddlePaddle docker runtime image, like
paddlepaddle/paddle:latest-gpu
- target image is the cloud job image name you want to build.
Loading

0 comments on commit 735177d

Please sign in to comment.