Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SubdomainPolicy to support a service per replica #197

Merged
merged 17 commits into from
Sep 12, 2024

Conversation

Edwinhr716
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it

Details can be found in KEP #188

Which issue(s) this PR fixes

Fixes #173

Special notes for your reviewer

Does this PR introduce a user-facing change?

Adds NetworkConfig and SubdomainPolicy to the API

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 16, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 16, 2024
@liurupeng
Copy link
Collaborator

liurupeng commented Aug 20, 2024

Would be great if you could

  1. Add one test to verify that after headless service mode update, for tpu case, we could see container env vars being updated in integration test
  2. Add a e2e test to verify the creation of each headless service mode and an update
  3. Manually test the multi-host TPU solution for not configuration the headless service mode and switch to the "LeaderSharedWorkerDedicated" case
  4. We may need to update the KEP for implementation part about how to populate TPU env vars for LeadersSharedWorkersDedicated mode and handle update

@Edwinhr716
Copy link
Contributor Author

  1. I think to test that scenario, an e2e is better since the env variables change due to the Pod.Spec.Subdomain value changing, which in a integration-test for the pod-webhook we'll have to manually change that value
  2. I can do an e2e test that covers both 1 & 2

@Edwinhr716 Edwinhr716 changed the title Adding SubdomainPolicy to support a service for the leaders, and a service per replica for the workers [WIP] Adding SubdomainPolicy to support a service for the leaders, and a service per replica for the workers Aug 21, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 21, 2024
@Edwinhr716 Edwinhr716 changed the title [WIP] Adding SubdomainPolicy to support a service for the leaders, and a service per replica for the workers Adding SubdomainPolicy to support a service per replica Aug 23, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 23, 2024
@Edwinhr716 Edwinhr716 changed the title Adding SubdomainPolicy to support a service per replica [WIP] Adding SubdomainPolicy to support a service per replica Aug 28, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 28, 2024
@Edwinhr716 Edwinhr716 changed the title [WIP] Adding SubdomainPolicy to support a service per replica Adding SubdomainPolicy to support a service per replica Aug 30, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2024
pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
api/leaderworkerset/v1/leaderworkerset_types.go Outdated Show resolved Hide resolved
@@ -122,27 +122,60 @@ func (r *LeaderWorkerSetReconciler) Reconcile(ctx context.Context, req ctrl.Requ
return ctrl.Result{}, nil
}

func (r *LeaderWorkerSetReconciler) createHeadlessServiceIfNotExists(ctx context.Context, lws *leaderworkerset.LeaderWorkerSet) error {
func (r *LeaderWorkerSetReconciler) createMultipleHeadlessServices(ctx context.Context, lws *leaderworkerset.LeaderWorkerSet, replicas int32) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReconcileHeadlessServices

test/testutils/validators.go Show resolved Hide resolved
test/e2e/e2e_test.go Show resolved Hide resolved
@liurupeng
Copy link
Collaborator

overall lgtm, @ahg-g could help check as well? thanks!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2024
@Edwinhr716 Edwinhr716 force-pushed the headless-service-impl branch from e4d4b04 to 103a98a Compare September 5, 2024 18:26
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review the tests yet.

I think we can simplify the implementation, see comments.

pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
pkg/webhooks/leaderworkerset_webhook.go Show resolved Hide resolved
api/leaderworkerset/v1/leaderworkerset_types.go Outdated Show resolved Hide resolved
pkg/webhooks/leaderworkerset_webhook.go Show resolved Hide resolved
pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
pkg/controllers/leaderworkerset_controller.go Outdated Show resolved Hide resolved
pkg/controllers/pod_controller.go Outdated Show resolved Hide resolved
test/testutils/validators.go Show resolved Hide resolved
test/testutils/validators.go Outdated Show resolved Hide resolved
test/e2e/e2e_test.go Outdated Show resolved Hide resolved
@Edwinhr716 Edwinhr716 force-pushed the headless-service-impl branch from 7a3c05e to 4cefc55 Compare September 12, 2024 18:47
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 12, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Sep 12, 2024

/lgtm
/approve
/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 12, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, Edwinhr716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2024
@k8s-ci-robot k8s-ci-robot merged commit 39f4dd3 into kubernetes-sigs:main Sep 12, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create one headless service for one replica/podgroup
5 participants