Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Dedicated vc support #2960

Merged
merged 35 commits into from
Jun 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
26608f5
add dedicated vc
fenghuajia May 29, 2019
0c94942
Merge branch 'master' of https://github.com/microsoft/pai into add_dedvc
fenghuajia May 29, 2019
127e26a
add dedicated vc(2)
fenghuajia May 30, 2019
b30dbfb
add dedicated vc(3)
fenghuajia May 31, 2019
44cbd4a
add description
mzmssg Jun 3, 2019
d5bca04
[exporter] add yarn node label support (#2869)
mzmssg Jun 3, 2019
e2230f4
Merge branch 'zimiao/add_vc_description' into dedicated_vc
mzmssg Jun 4, 2019
b7ca9aa
no
fenghuajia Jun 5, 2019
58f3c46
Merge branch 'dedicated_vc' of https://github.com/microsoft/pai into …
fenghuajia Jun 5, 2019
fb47a19
add dedicated vc(4)
fenghuajia Jun 5, 2019
e969ded
add cmdline tool link
fenghuajia Jun 5, 2019
fa1549c
common to shared
mzmssg Jun 5, 2019
8809bec
typo
mzmssg Jun 5, 2019
2cc0398
add dedicated vc description spacing
fenghuajia Jun 6, 2019
e1dc2e7
add dedicated vc description spacing
fenghuajia Jun 6, 2019
3d6b1a4
change vc style
fenghuajia Jun 6, 2019
8b245af
add vc title info icon
fenghuajia Jun 10, 2019
12f4796
virtual clusters tab bar
fenghuajia Jun 11, 2019
9eabf1e
remove yarn.lock
mzmssg Jun 12, 2019
b87dde1
virtual clusters tab bar(3)
fenghuajia Jun 12, 2019
7d3fde1
remove yarn.lock
mzmssg Jun 12, 2019
0214b55
Revert "remove yarn.lock"
mzmssg Jun 12, 2019
7db0a94
revert package.json & yarn.lock
mzmssg Jun 12, 2019
06a6bf2
readd office-ui-fabric-js
Gerhut Jun 12, 2019
8b45c7e
[dedicated vc] rest api (#2921)
mzmssg Jun 13, 2019
0e6b1e0
virtual clusters tab bar(4)
fenghuajia Jun 13, 2019
7fc2861
Merge branch 'dedicated_vc' of https://github.com/microsoft/pai into …
fenghuajia Jun 13, 2019
85b4f88
return to the vc style with the icon
fenghuajia Jun 13, 2019
3965f58
Revert "return to the vc style with the icon"
fenghuajia Jun 13, 2019
030aee7
return to the vc style with the icon
fenghuajia Jun 13, 2019
3383d8d
[dedicated vc] management tool (#2923)
mzmssg Jun 14, 2019
7216179
[webportal]remove the vc page information icon (#2936)
fenghuajia Jun 18, 2019
98b7823
remove package-lock
mzmssg Jun 18, 2019
1634c1f
removed files
mzmssg Jun 18, 2019
4e1cc28
doc ifx
mzmssg Jun 18, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions docs/rest-server/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -781,9 +781,9 @@ GET /api/v1/virtual-clusters/:vcName
Status: 200

{
//capacity percentage this virtual cluster can use of entire cluster
// capacity percentage this virtual cluster can use of entire cluster
"capacity":50,
//max capacity percentage this virtual cluster can use of entire cluster
// max capacity percentage this virtual cluster can use of entire cluster
"maxCapacity":100,
// used capacity percentage this virtual cluster can use of entire cluster
"usedCapacity":0,
Expand All @@ -795,7 +795,18 @@ Status: 200
"vCores":0,
"GPUs":0
},
"state":"running"
"resourcesTotal":{
"memory":0,
"vCores":0,
"GPUs":0
},
"dedicated": true/false,
// available node list for this virtual cluster
"nodeList": [node1, node2, ...],
// RUNNING: vc is enabled
// STOPPED: vc is disabled, without either new job or running job.
// DRAINING: intermedia state from RUNNING to STOPPED, in waiting on existing job.
"status":"RUNNING"/"STOPPED"/"DRAINING"
}
```

Expand Down
98 changes: 98 additions & 0 deletions docs/tools/dedicated_vc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# dedicated_vc

## Overview

Unlike shared Virtual Clusters sharing cluster nodes, dedicated Virtual Cluster is binding to 1 or more physical nodes.
Once a node is assigned to a dedicated VC, shared VCs are no longer able to use its resource.
The whole cluster resource is split as below:

```
Cluster Resource
├── Shared Resource:
│ ├── DEFAULT: capacity
│ ├── Shared VC_1: capacity
│ └── Shared VC_2: capacity
└── Dedicated Resource:
├── Dedicated VC_1: node1, node2
└── Dedicated VC_2: node3, node4

shared_vc_resource = shared_resource * shared_vc_capacity
dedicated_vc_resource = sum(dedicated_vc_nodes)
```

A job submitted to Shared VC might be scheduled to any shared nodes,
oppositely, the one submitted to Dedicated VC could only be scheduled to corresponding dedicated nodes.


Currently we support configure shared_vc by web UI, but only cmdline tool for dedicated_vc.
This doc introduce more details.


## Commands

We provide get, add and remove dedicated vc in the node_maintain.py, working directory is pai/src/tools.
```bash
python node_maintain.py dedicated-vc {get,add,remove}
```

### Get dedicated-vc

```bash
python node_maintain.py dedicated-vc get -m {master_ip}
```
This command output dedicated vc name, nodes and resource,

#### Examples:

```
$ python node_maintain.py dedicated-vc get -m 10.0.0.1
dedicated_1:
Nodes:
Resource: <CPUs:0.0, Memory:0.0MB, GPUs:0.0>
dedicated_2:
Nodes: 10.0.0.2, 10.0.0.3
Resource: <CPUs:24.0, Memory:208896.0MB, GPUs:4.0>
```


### Add dedicated-vc

```bash
python node_maintain.py dedicated-vc add -m {master_ip} -v {added_vc_name} [-n {added_nodes}]
```
This command added {added_nodes} to {added_vc_name}, if {added_vc_name} was not found, this command would create it firstly.
Dedicated_vc resource is allocated from Shared VC pool and subtracted from DEFAULT VC quota.
The remaining Shared VCs' capacity will be recalculated to ensure a constant **GPU** quota.
If no enough DEFAULT quota, allocation will raise error.

#### Examples:

```
# Add an empty dedicated_3
$ python node_maintain.py dedicated-vc add -m 10.0.0.1 -v dedicated_3

# Add 10.0.0.4 to dedicated_3
$ python node_maintain.py dedicated-vc add -m 10.0.0.1 -v dedicated_3 -n 10.0.0.4
```

### Remove dedicated-vc

```bash
python node_maintain.py dedicated-vc remove -m {master_ip} -v {removed_vc_name} [-n {removed_nodes}]
```
This command deleted {removed_nodes} from {removed_vc_name}, if {removed_nodes} omitted, it would delete whole vc.
Deleted resource will be back to Shared VC pool, more specifically, to DEFAULT VC.

#### Examples:

```
# Remove 10.0.0.2 from dedicated_2
$ python node_maintain.py dedicated-vc remove -m 10.0.0.1 -v dedicated_2 -n 10.0.0.2

# Remove dedicated_2 and free all nodes
$ python node_maintain.py dedicated-vc remove -m 10.0.0.1 -v dedicated_2
```




2 changes: 1 addition & 1 deletion src/dev-box/build/dev-box.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ RUN apt-get -y update && \
net-tools && \
mkdir -p /cluster-configuration &&\
git clone https://github.com/Microsoft/pai.git &&\
pip install python-etcd docker kubernetes GitPython jsonschema
pip install python-etcd docker kubernetes GitPython jsonschema attrs dicttoxml beautifulsoup4

WORKDIR /tmp

Expand Down
Loading