Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import v1alpha2 logic code #463

Merged
merged 6 commits into from
Mar 16, 2018
Merged

Conversation

ScorpioCPH
Copy link
Member

@ScorpioCPH ScorpioCPH commented Mar 15, 2018

Hi, this PR implement the main logic of this design proposal kubeflow/community#30

  • Redefine TFJobController struct.
  • Use event-driven and reconcile to create new pods and services.
    • Copy some utils code from k8s.io/kubernetes/pkg/controller.
    • We should not import the huge package k8s.io/kubernetes/pkg/controller
  • Update examples for test
    • Use busybox instead of real TensorFlow model for test now.

TODOs:

  • Add error handling logic.
  • Update status gracefully.
  • Add UT.

@jlewi @gaocegege @lluunn @jimexist @DjangoPeng @ddysher PTAL.

Note: Please do not use squash merge as this is a huge PR, let's keep the commits history, thanks!


This change is Reviewable

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM with some issues about name.

if len(os.Getenv(RecommendedConfigPathEnvVar)) > 0 {
// use the current context in kubeconfig
// This is very useful for running locally.
return clientcmd.BuildConfigFromFlags("", os.Getenv(RecommendedConfigPathEnvVar))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could keep the logic for dev locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this in server.go, PTAL.

)

const (
controllerName = "kubeflow"
)
controllerName = "tfjob-controller"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could name it tf-operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

config tfv1alpha1.ControllerConfig, tfJobInformerFactory informers.SharedInformerFactory) (*Controller, error) {
tfJobInformer := tfJobInformerFactory.Kubeflow().V1alpha1().TFJobs()
// NewTFJobController returns a new TFJob controller.
func NewTFJobController(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me more info about why the name of the function is changed to NewTFJobController? I think New is fine in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the style as kubernetes controllers.

ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 15 * time.Second},
}

type TFJobController struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Controller or Operator is enough, since this repo is only for TFJob, WDYT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the style as kubernetes controllers.


// Start the informer factories to begin populating the informer caches
log.Info("Starting TFJob controller")

// Wait for the caches to be synced before starting workers
log.Info("Waiting for informer caches to sync")
if ok := cache.WaitForCacheSync(stopCh, c.TFJobSynced); !ok {
if ok := cache.WaitForCacheSync(stopCh, tc.tfJobListerSynced); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait pod and service to be synced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

if controllerRef.Kind != controllerKind.Kind {
return nil
}
job, err := tc.tfJobLister.TFJobs(namespace).Get(controllerRef.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/job/tfjob

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return true, nil
}

type PodControllerRefManager struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could use the ref manager in "k8s.io/kubernetes/pkg/controller".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Note in the beginning of this file:

// Note(CPH): this file is copied form k8s.io/kubernetes/pkg/controller
// We should not import the huge package k8s.io/kubernetes/pkg

@@ -0,0 +1,207 @@
// Copyright 2018 The Kubeflow Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the file name has redundancy, could we name it pkg/controller/service.go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this file is split from pkg/controller/controller.go.

// Only abstracted out for testing.
// Warning: if using KeyFunc it is not safe to use a single ControllerExpectationsInterface with different
// types of controllers, because the keys might conflict across types.
type ControllerExpectationsInterface interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we write our own ControllerExpectationsInterface? I think we could reuse the data structure in k8s code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is copied form k8s.io/kubernetes/pkg/controller.
k8s.io/kubernetes/pkg is used internal and we should not import the huge package k8s.io/kubernetes/pkg.

// RSControlInterface is an interface that knows how to add or delete
// ReplicaSets, as well as increment or decrement them. It is used
// by the deployment controller to ease testing of actions that it takes.
type RSControlInterface interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this interface? I think we have no plan for ReplicaSet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove this.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: gaocegege

Assign the PR to them by writing /assign @gaocegege in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ScorpioCPH ScorpioCPH force-pushed the v1alpha2 branch 2 times, most recently from b03b2b8 to fee07ae Compare March 15, 2018 06:03
@gaocegege
Copy link
Member

Generally LGTM but please fix the test in travis. Then I will merge it manually.

@ScorpioCPH ScorpioCPH force-pushed the v1alpha2 branch 2 times, most recently from 5a23d8e to 9b8c398 Compare March 15, 2018 09:36
@gaocegege
Copy link
Member

I will merge it first since it blocks my work on test #455

@gaocegege gaocegege merged commit 1463dba into kubeflow:v1alpha2 Mar 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants