Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hyperNode controller framework and provider #4014

Open
wants to merge 1 commit into
base: network-topology
Choose a base branch
from

Conversation

Monokaix
Copy link
Member

What type of PR is this?

/kind feature
/area controllers

What this PR does / why we need it:

Add hyperNode controller provider and vendors can register plugins to add/update/delete hyperNodes by auto discovering network topology of their own datacenters.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Vendors should register their own plugins and integrate with vc controller to reconcile hyperNodes 

@volcano-sh-bot volcano-sh-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 17, 2025
@volcano-sh-bot volcano-sh-bot added area/controllers size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2025
@Monokaix Monokaix force-pushed the network-topo-controller branch 2 times, most recently from e3671f8 to 5c3c66d Compare February 18, 2025 10:48
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign monokaix
You can assign the PR to them by writing /assign @monokaix in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 18, 2025
@Monokaix Monokaix force-pushed the network-topo-controller branch 19 times, most recently from 77df7d9 to 2896479 Compare February 20, 2025 09:04
@Monokaix Monokaix force-pushed the network-topo-controller branch 3 times, most recently from bb42d94 to df2089d Compare February 20, 2025 09:33
@Monokaix Monokaix force-pushed the network-topo-controller branch 3 times, most recently from faa2068 to d2fd538 Compare February 21, 2025 06:15
@Monokaix Monokaix changed the title [WIP]Add hyperNode controller framework and provider Add hyperNode controller framework and provider Feb 21, 2025
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 21, 2025
Signed-off-by: Monokaix <changxuzheng@huawei.com>
@Monokaix Monokaix force-pushed the network-topo-controller branch from d2fd538 to 59d50f6 Compare February 21, 2025 06:23
@Monokaix
Copy link
Member Author

cc @lowang-bh @hwdef @JesseStutler

@hwdef
Copy link
Member

hwdef commented Mar 3, 2025

please import copilot :)

@Monokaix Monokaix requested a review from Copilot March 3, 2025 07:43
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces the hyperNode controller framework and provider to support the auto-discovery of network topology for hyperNodes, enabling vendors to seamlessly integrate their plugins with the Volcano controller.

  • Added a new hyperNode controller implementation with basic lifecycle methods.
  • Introduced a plugin-based hyperNode provider with dynamic loading and event handling.
  • Updated related tests, installer YAMLs, and command-line options to integrate the new framework.

Reviewed Changes

File Description
example/hypernode-provider/README.md Added documentation for writing and building a hyperNode provider
pkg/controllers/hypernode/hypernode_controller.go Introduced the core hyperNode controller functions
pkg/controllers/hypernode/hypernode_controller_test.go Added tests for the hyperNode controller run logic
example/hypernode-provider/example_provider.go Provided an example implementation of a hyperNode provider
pkg/controllers/hypernode/provider/provider_test.go Added unit tests covering add, update, and delete event scenarios
pkg/controllers/hypernode/provider/interface.go Defined the provider plugin interface
pkg/controllers/hypernode/provider/provider.go Implemented the provider loading, event handling, and retry logic
installer/volcano-development.yaml and installer/helm/chart/volcano/templates/controllers.yaml Updated RBAC roles to include permissions for hyperNodes
pkg/controllers/framework/interface.go, cmd/controller-manager/app/options/options.go, cmd/controller-manager/main.go, cmd/controller-manager/app/server.go Updated server options and integration to support the new provider

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/controllers/hypernode/provider/provider.go:220

  • [nitpick] Function name 'handleNodeDeleted' is inconsistent with the naming of 'handleHyperNodeAdd' and 'handleHyperNodeUpdate'. Consider renaming it to 'handleHyperNodeDelete' for consistency.
func (p *provider) handleNodeDeleted(event Event) {

err := retry.OnError(
backoff,
func(err error) bool {
return true
Copy link
Preview

Copilot AI Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In handleHyperNodeUpdate, the unconditional retry condition may lead to infinite retries on non-transient errors. Consider refining the retry condition to check for specific error types.

Suggested change
return true
return !apierrors.IsConflict(err) && !apierrors.IsInvalid(err)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
}

func (p *provider) loadProvider(dir string) error {
pluginPaths, _ := filepath.Glob(fmt.Sprintf("%s/*.so", dir))
Copy link
Preview

Copilot AI Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned by filepath.Glob is ignored. Handling this error could prevent issues during plugin loading if globbing fails.

Suggested change
pluginPaths, _ := filepath.Glob(fmt.Sprintf("%s/*.so", dir))
pluginPaths, err := filepath.Glob(fmt.Sprintf("%s/*.so", dir))
if err != nil {
return fmt.Errorf("failed to glob plugins: %v", err)
}

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link
Member

@hwdef hwdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality is high as always, with only a few minor issues


# Install musl
RUN apt-get update && \
apt-get install -y sudo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sudo necessary? Is the default user in the container root?

RUN apt-get update && \
apt-get install -y sudo

RUN wget http://musl.libc.org/releases/musl-1.2.1.tar.gz && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN wget http://musl.libc.org/releases/musl-1.2.1.tar.gz && \
RUN wget http://musl.libc.org/releases/musl-latest.tar.gz && \

}

func (p *provider) loadProvider(dir string) error {
pluginPaths, _ := filepath.Glob(fmt.Sprintf("%s/*.so", dir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore the error?

func (p *provider) loadProvider(dir string) error {
pluginPaths, _ := filepath.Glob(fmt.Sprintf("%s/*.so", dir))
for _, pluginPath := range pluginPaths {
klog.InfoS("Loading provider plugin...", "", "path", pluginPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infos is used in some places, will this lead to too many logs by default?

return err
}
plugin := pb()
//pluginName := getPluginName(pluginPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this if not used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controllers kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants