Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sedna FederatedLearning controller enhancement #446

Merged

Conversation

SherlockShemol
Copy link
Contributor

@SherlockShemol SherlockShemol commented Sep 29, 2024

What type of PR is this?

What this PR does / why we need it:

FeratedLearningJob controller enhancement

  • cascade deletion: improve the controller so that it have the ability of cascade deletion.
  • pod self-healing ability: recreate pod when manually deleting the pod.
  • update pod when when FederatedLearningJob CRD is changed.
  • Add test file to ensure functionality stability and correctness.
    Which issue(s) this PR fixes:

Fixes #

@kubeedge-bot kubeedge-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 29, 2024
@@ -40,6 +41,7 @@ type JointInferenceService struct {
type JointInferenceServiceSpec struct {
EdgeWorker EdgeWorker `json:"edgeWorker"`
CloudWorker CloudWorker `json:"cloudWorker"`
appsv1.DeploymentSpec `json:",inline"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a definition of DeploymentSpec to the api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my oversight. The definition was not used in the code and I have removed it.

Copy link
Contributor

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corelisters "k8s.io/client-go/listers/apps/v1"
corelistersv1 "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/workqueue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import into three groups (internal package/external package/project reference package)
for example:
context
...

v1 "k8s. IO/API/core/v1"
...

"github.com/kubeedge/sedna/pkg/globalmanager/config"
...

deploymentInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: jc.addDeployment,
UpdateFunc: jc.updateDeployment,
DeleteFunc: jc.deleteDeployment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user manually deleted deployment, whether deployment will be rebuilt?

Comment on lines 87 to 89
deploymentStoreSynced cache.InformerSynced
// A store of deployment
deploymentsLister appslisters.DeploymentLister
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that deployment is in use as a controller, can the pod related monitoring logic be removed from the code?

return
}
for _, deployment := range deployments {
c.kubeClient.AppsV1().Deployments(curService.Namespace).Delete(context.TODO(), deployment.Name, metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build deployment first, and then use update function to update, rather than simply delete deployment
newDeployment := ...
c.kubeClient.AppsV1().Deployments(newDeployment.Namespace).Update(context.TODO(), newDeployment, metav1.UpdateOptions{})

Comment on lines 13 to 19
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/watch"
kubernetesfake "k8s.io/client-go/kubernetes/fake"
"k8s.io/client-go/kubernetes/scheme"
v1core "k8s.io/client-go/kubernetes/typed/core/v1"
corelisters "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/workqueue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packet grouping

@SherlockShemol SherlockShemol force-pushed the fl-controller-enhancement branch 6 times, most recently from be8aa11 to 82add2d Compare October 21, 2024 07:20
…ion of federated learning cotroller. \n 2.enable federated learning pod to rebuild itself if it is manually or wrongly deleted. \n 3.enable self updating pod config when modifying CRD of federated learning. \n 4.add test file to ensure the correctness of the solution.

Signed-off-by: SherlockShemol <shemol@163.com>
@SherlockShemol SherlockShemol force-pushed the fl-controller-enhancement branch from c3953fa to b37522e Compare October 24, 2024 14:32
@SherlockShemol SherlockShemol changed the title Sedna JointInferenceService and FederatedLearning controller enhancement Sedna FederatedLearning controller enhancement Oct 25, 2024
@tangming1996
Copy link
Contributor

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2024
@MooreZheng
Copy link
Contributor

/lgtm

1 similar comment
@jaypume
Copy link
Member

jaypume commented Oct 30, 2024

/lgtm

@jaypume
Copy link
Member

jaypume commented Oct 30, 2024

/approve

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypume

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2024
@kubeedge-bot kubeedge-bot merged commit 712b62b into kubeedge:main Oct 30, 2024
11 checks passed
@MooreZheng
Copy link
Contributor

Besides, this PR aims to fix #430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants