Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queue controller #128

Merged
merged 16 commits into from
May 10, 2019
Merged

Conversation

hzxuzhonghu
Copy link
Collaborator

fixes: #16

@@ -63,7 +63,7 @@ func ControlledBy(obj interface{}, gvk schema.GroupVersionKind) bool {
return false
}

func CreateConfigMapIfNotExist(job *vkv1.Job, kubeClients *kubernetes.Clientset, data map[string]string, cmName string) error {
func CreateConfigMapIfNotExist(job *vkv1.Job, kubeClients kubernetes.Interface, data map[string]string, cmName string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to have seperate PR for such kind of cleanup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

glog.V(2).Infoln("queue %s has not been seen or deleted")
return nil
}
for pgKey := range c.podGroups[key] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to handle this in Cache: Queue -> Job -> PodGroup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I donot want to overlap with job controller. For queue controller, it just care about the podgroups, not care about Job or anything else.

podGroups is keyed by queue name and value is the podgroups that are using this queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/cache is a common component in controller, and we need to clean up jobs when queue was deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC, then we need to record jobs too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But have a doubt: why would user delete their queues before jobs? Is there a use case? If there is a malicious user who does this, we should not delete the running jobs by any reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, I can think of a more simple way to do this: I remember there will be an admission controller to validate queue existence before create jobs. Then we can also set the job's owner reference. Then when the queue is deleted, k8s garbage collector can reclaim all children resources automatically.

So we have two choice:

  1. do it in controller, need to record all dependents relationship.

  2. setting ownerreference, do nothing else, leave it to GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally prefer not adding this into cache even we need considering deleting jobs when queue deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting ownerreference, do nothing else, leave it to GC.

owerReference should be reserved for JobGroup :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OwnerReferences is an array, can be more than one.

owerReference should be reserved for JobGroup :)

Sorry, there is no JobGroup now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether GC will check controlled owner reference.

there is no JobGroup now.

so "reserved" it.

Copy link
Contributor

@TommyLike TommyLike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2E tests are required

@@ -72,10 +76,17 @@ func Run(opt *options.ServerOption) error {
return err
}

jobController := job.NewJobController(config)
// TODO: add user agent for different controllers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is user agent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for http server to know the client identity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A http header

glog.V(2).Infoln("queue %s has not been seen or deleted")
return nil
}
for pgKey := range c.podGroups[key] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally prefer not adding this into cache even we need considering deleting jobs when queue deleted.

@hzxuzhonghu
Copy link
Collaborator Author

I am adding e2e

@TommyLike
Copy link
Contributor

@hzxuzhonghu please update the cluster role to get appropriate rights to handle queue resource.

@hzxuzhonghu
Copy link
Collaborator Author

Thanks

@TommyLike
Copy link
Contributor

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2019
@TommyLike
Copy link
Contributor

/approve

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2019
@volcano-sh-bot volcano-sh-bot merged commit c0911cd into volcano-sh:master May 10, 2019
@hzxuzhonghu hzxuzhonghu deleted the queue-controller branch May 13, 2019 01:30
kevin-wangzefeng pushed a commit to kevin-wangzefeng/volcano that referenced this pull request Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Queue controller and related cli
4 participants