-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] PoC ResourceClass #3
base: master
Are you sure you want to change the base?
Conversation
90af11e
to
99325c2
Compare
@@ -0,0 +1,14 @@ | |||
--- | |||
kind: ResourceClass | |||
metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[root@dell-r620-01 kubernetes]# kubectl get resourceclass my.rc2 -o yaml
apiVersion: v1
kind: ResourceClass
metadata:
creationTimestamp: 2017-05-30T05:27:48Z
name: my.rc2
resourceVersion: "15222"
selfLink: /api/v1/namespaces/default/resourceclasses/my.rc2
uid: b820fe43-44f8-11e7-94f7-bc305bf4e400
spec:
resourceSelector:
- matchExpressions:
- key: Type
operator: In
values:- nvidia-gpu
status: {}
- nvidia-gpu
- key: Type
metadata: | ||
name: nvidia-tesla-gpu | ||
labels: | ||
type: nvidia-gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[root@dell-r620-01 kubernetes]# kubectl get node mynode -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2017-05-24T16:32:26Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
uid: 927af813-409e-11e7-80f5-bc305bf4e400
spec:
externalID: dell-r620-01.perf.lab.eng.rdu.redhat.com
status:
addresses:
- address: 10.12.20.41
type: InternalIP - address: dell-r620-01.perf.lab.eng.rdu.redhat.com
type: Hostname - address: 10.12.20.41
type: LegacyHostIP
allocatable:
cpu: "16"
memory: 131714184Ki
pods: "110"
capacity:
cpu: "16"
memory: 131816584Ki
pods: "110"
conditions:
daemonEndpoints:
kubeletEndpoint:
Port: 10250
deviceAllocatable:
- metadata:
creationTimestamp: null
labels:
compute-ability: "3.7"
ecc: "true"
family: tesla
memory: 10Gi
model: k80
nvlink: "true"
quantity: "8"
type: nvidia-gpu
name: nvidia-tesla-gpu
deviceCapacity:
- metadata:
creationTimestamp: null
labels:
compute-ability: "3.7"
ecc: "true"
family: tesla
memory: 10Gi
model: k80
nvlink: "true"
quantity: "8"
type: nvidia-gpu
name: nvidia-tesla-gpu
images:
- names:
- kube-build:build-748641720b-5-v1.8.1-2
sizeBytes: 3158843585
- kube-build:build-748641720b-5-v1.8.1-2
- names:
- @
- :
sizeBytes: 3158843585
- names:
- gcr.io/google_containers/kube-cross@sha256:ac1327a97b1b3e01d2ded9a781bb5675cd657af3c9d1ff56b97fcc3f2f2cd59c
- gcr.io/google_containers/kube-cross:v1.8.1-2
sizeBytes: 2357991939
- names:
@vikaschoudhary16 I will review it today |
In the first commit, idea that we had discussed was to read from one json file that has list of devices. So lets discuss to simpify changes in pkg/util/node/node.go. I am going through other commits. |
Resource class creation and node status updation basic testingg done. Templates in this commit are used for testing resourceclass creation and devices updation in node status. Adding fixes for the issues encountered during this testing.
b491d69
to
1cf0ec5
Compare
A very minimal testing done. Overlapping resource classes testing is under progress. Design doc will be shared soon. Early reviews on design/approach are welcome. To be implemented: - Resource class delete and node delete handling
1cf0ec5
to
bcbc01f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Vikas! A few questions inline... but mostly from reading through this, I get the feeling there's just a bit too much coupling between both the device plugin subsystem and resource classes (as well as a concern there is similar coupling between node and resource class).
Best,
-jay
@@ -327,12 +328,15 @@ func ClusterRoles() []rbac.ClusterRole { | |||
rbac.NewRule(Read...).Groups(legacyGroup).Resources("nodes", "pods").RuleOrDie(), | |||
rbac.NewRule("create").Groups(legacyGroup).Resources("pods/binding", "bindings").RuleOrDie(), | |||
rbac.NewRule("update").Groups(legacyGroup).Resources("pods/status").RuleOrDie(), | |||
rbac.NewRule("patch").Groups(legacyGroup).Resources("pods").RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to your patch? Maybe copy/pasted by accident?
@@ -45,10 +45,11 @@ func (pfactory *PredicateMetadataFactory) GetMetadata(pod *v1.Pod, nodeNameToInf | |||
if err != nil { | |||
return nil | |||
} | |||
resReq, _ := GetResourceRequest(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and line 52 seem to be unrelated changes?
fmt.Printf("\n%s c.List %p \n", file_line(), c.List) | ||
allResClasses, err := c.List(labels.Everything()) | ||
if err != nil { | ||
return nil, fmt.Errorf("error retrieving rcList from cache: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth spelling out resourceclassList instead of rcList
// C2: | ||
// CPU: 1 | ||
// Memory: 1G | ||
// | ||
// Result: CPU: 3, Memory: 3G | ||
func GetResourceRequest(pod *v1.Pod) *schedulercache.Resource { | ||
// Result: CPU: 3, Memory: 3G, ['nvidia-gpu': 2, 'solarflare-40gig': 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems specific to your environment. What's the plan for making this more generic?
// Result: CPU: 3, Memory: 3G | ||
func GetResourceRequest(pod *v1.Pod) *schedulercache.Resource { | ||
// Result: CPU: 3, Memory: 3G, ['nvidia-gpu': 2, 'solarflare-40gig': 1] | ||
func GetResourceRequest(pod *v1.Pod) (*schedulercache.Resource, *map[string]int32) { | ||
result := schedulercache.Resource{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious on this... did you consider modifying the schedulercache.Resource struct to include information about the new resource classes? Or is this just proof-of-concept stuff? I'm wondering because this kind of gets at my comment on the spec PR about the resource classes being special snowflakes compared to other resources.
rc.resClass = rClass | ||
for _, info := range cache.nodes { | ||
fmt.Printf("\n%s cache.nodes %+v\n", file_line(), cache.nodes) | ||
rcPerNodeInfo, err := info.AddResourceClass(rClass, info.node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells a little too much like hard coupling between nodes and resource classes. Or am I missing something non-obvious here?
No description provided.