[WIP] PoC ResourceClass #3

vikaschoudhary16 · 2017-05-29T11:37:26Z

No description provided.

vikaschoudhary16 · 2017-05-30T05:59:07Z

examples/resource-class/rclass.yml

@@ -0,0 +1,14 @@
+--- 
+kind: ResourceClass
+metadata: 


[root@dell-r620-01 kubernetes]# kubectl get resourceclass my.rc2 -o yaml
apiVersion: v1
kind: ResourceClass
metadata:
creationTimestamp: 2017-05-30T05:27:48Z
name: my.rc2
resourceVersion: "15222"
selfLink: /api/v1/namespaces/default/resourceclasses/my.rc2
uid: b820fe43-44f8-11e7-94f7-bc305bf4e400
spec:
resourceSelector:

matchExpressions:

key: Type
operator: In
values:

nvidia-gpu
status: {}

vikaschoudhary16 · 2017-05-30T06:17:53Z

examples/resource-class/nvidia.tesla.gpu.yaml

+metadata:
+  name: nvidia-tesla-gpu
+  labels:
+    type: nvidia-gpu


[root@dell-r620-01 kubernetes]# kubectl get node mynode -o yaml

apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2017-05-24T16:32:26Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
uid: 927af813-409e-11e7-80f5-bc305bf4e400
spec:
externalID: dell-r620-01.perf.lab.eng.rdu.redhat.com
status:
addresses:

address: 10.12.20.41
type: InternalIP

address: dell-r620-01.perf.lab.eng.rdu.redhat.com
type: Hostname

address: 10.12.20.41
type: LegacyHostIP
allocatable:
cpu: "16"
memory: 131714184Ki
pods: "110"
capacity:
cpu: "16"
memory: 131816584Ki
pods: "110"
conditions:

daemonEndpoints:
kubeletEndpoint:
Port: 10250

deviceAllocatable: - metadata: creationTimestamp: null labels: compute-ability: "3.7" ecc: "true" family: tesla memory: 10Gi model: k80 nvlink: "true" quantity: "8" type: nvidia-gpu name: nvidia-tesla-gpu deviceCapacity: - metadata: creationTimestamp: null labels: compute-ability: "3.7" ecc: "true" family: tesla memory: 10Gi model: k80 nvlink: "true" quantity: "8" type: nvidia-gpu name: nvidia-tesla-gpu

images:

names:

kube-build:build-748641720b-5-v1.8.1-2
sizeBytes: 3158843585

names:

@

:
sizeBytes: 3158843585

names:

gcr.io/google_containers/kube-cross@sha256:ac1327a97b1b3e01d2ded9a781bb5675cd657af3c9d1ff56b97fcc3f2f2cd59c

gcr.io/google_containers/kube-cross:v1.8.1-2
sizeBytes: 2357991939

names:

aveshagarwal · 2017-05-30T13:14:01Z

@vikaschoudhary16 I will review it today

aveshagarwal · 2017-06-07T14:37:36Z

In the first commit, idea that we had discussed was to read from one json file that has list of devices. So lets discuss to simpify changes in pkg/util/node/node.go. I am going through other commits.

Resource class creation and node status updation basic testingg done. Templates in this commit are used for testing resourceclass creation and devices updation in node status. Adding fixes for the issues encountered during this testing.

A very minimal testing done. Overlapping resource classes testing is under progress. Design doc will be shared soon. Early reviews on design/approach are welcome. To be implemented: - Resource class delete and node delete handling

jaypipes

Hi Vikas! A few questions inline... but mostly from reading through this, I get the feeling there's just a bit too much coupling between both the device plugin subsystem and resource classes (as well as a concern there is similar coupling between node and resource class).

Best,
-jay

jaypipes · 2017-12-18T21:16:40Z

plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go

@@ -327,12 +328,15 @@ func ClusterRoles() []rbac.ClusterRole {
 				rbac.NewRule(Read...).Groups(legacyGroup).Resources("nodes", "pods").RuleOrDie(),
 				rbac.NewRule("create").Groups(legacyGroup).Resources("pods/binding", "bindings").RuleOrDie(),
 				rbac.NewRule("update").Groups(legacyGroup).Resources("pods/status").RuleOrDie(),
+				rbac.NewRule("patch").Groups(legacyGroup).Resources("pods").RuleOrDie(),


Is this related to your patch? Maybe copy/pasted by accident?

jaypipes · 2017-12-18T21:17:53Z

plugin/pkg/scheduler/algorithm/predicates/metadata.go

@@ -45,10 +45,11 @@ func (pfactory *PredicateMetadataFactory) GetMetadata(pod *v1.Pod, nodeNameToInf
 	if err != nil {
 		return nil
 	}
+	resReq, _ := GetResourceRequest(pod)


This and line 52 seem to be unrelated changes?

jaypipes · 2017-12-18T21:19:35Z

plugin/pkg/scheduler/algorithm/predicates/predicates.go

+	fmt.Printf("\n%s c.List %p \n", file_line(), c.List)
+	allResClasses, err := c.List(labels.Everything())
+	if err != nil {
+		return nil, fmt.Errorf("error retrieving rcList from cache: %v", err)


probably worth spelling out resourceclassList instead of rcList

jaypipes · 2017-12-18T21:21:23Z

plugin/pkg/scheduler/algorithm/predicates/predicates.go

 //     C2:
 //       CPU: 1
 //       Memory: 1G
 //
-// Result: CPU: 3, Memory: 3G
-func GetResourceRequest(pod *v1.Pod) *schedulercache.Resource {
+// Result: CPU: 3, Memory: 3G, ['nvidia-gpu': 2, 'solarflare-40gig': 1]


This seems specific to your environment. What's the plan for making this more generic?

jaypipes · 2017-12-18T21:23:42Z

plugin/pkg/scheduler/algorithm/predicates/predicates.go

-// Result: CPU: 3, Memory: 3G
-func GetResourceRequest(pod *v1.Pod) *schedulercache.Resource {
+// Result: CPU: 3, Memory: 3G, ['nvidia-gpu': 2, 'solarflare-40gig': 1]
+func GetResourceRequest(pod *v1.Pod) (*schedulercache.Resource, *map[string]int32) {
 	result := schedulercache.Resource{}


Just curious on this... did you consider modifying the schedulercache.Resource struct to include information about the new resource classes? Or is this just proof-of-concept stuff? I'm wondering because this kind of gets at my comment on the spec PR about the resource classes being special snowflakes compared to other resources.

jaypipes · 2017-12-18T21:27:26Z

plugin/pkg/scheduler/schedulercache/cache.go

+		rc.resClass = rClass
+		for _, info := range cache.nodes {
+			fmt.Printf("\n%s cache.nodes %+v\n", file_line(), cache.nodes)
+			rcPerNodeInfo, err := info.AddResourceClass(rClass, info.node)


This smells a little too much like hard coupling between nodes and resource classes. Or am I missing something non-obvious here?

vikaschoudhary16 force-pushed the resource_class branch 2 times, most recently from 90af11e to 99325c2 Compare May 30, 2017 05:56

vikaschoudhary16 commented May 30, 2017

View reviewed changes

vikaschoudhary16 added 2 commits June 8, 2017 02:34

[WIP] PoC ResourceClass

0ad6c4e

Few fixes and sample templates

76ae0c5

Resource class creation and node status updation basic testingg done. Templates in this commit are used for testing resourceclass creation and devices updation in node status. Adding fixes for the issues encountered during this testing.

vikaschoudhary16 force-pushed the resource_class branch 3 times, most recently from b491d69 to 1cf0ec5 Compare June 12, 2017 10:23

[WIP] Add scheduler logic

bcbc01f

A very minimal testing done. Overlapping resource classes testing is under progress. Design doc will be shared soon. Early reviews on design/approach are welcome. To be implemented: - Resource class delete and node delete handling

vikaschoudhary16 force-pushed the resource_class branch from 1cf0ec5 to bcbc01f Compare June 12, 2017 20:04

jaypipes reviewed Dec 18, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] PoC ResourceClass #3

[WIP] PoC ResourceClass #3

vikaschoudhary16 commented May 29, 2017

vikaschoudhary16 May 30, 2017

vikaschoudhary16 May 30, 2017

aveshagarwal commented May 30, 2017

aveshagarwal commented Jun 7, 2017

jaypipes left a comment

jaypipes Dec 18, 2017

jaypipes Dec 18, 2017

jaypipes Dec 18, 2017

jaypipes Dec 18, 2017

jaypipes Dec 18, 2017

jaypipes Dec 18, 2017

[WIP] PoC ResourceClass #3

Are you sure you want to change the base?

[WIP] PoC ResourceClass #3

Conversation

vikaschoudhary16 commented May 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aveshagarwal commented May 30, 2017

aveshagarwal commented Jun 7, 2017

jaypipes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment