Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Cluster for zedkube related functions #4408

Closed

Conversation

naiming-zededa
Copy link
Contributor

  • check kubernetes pod status and publish ENClusterAppStatus
    for multi-node cluster
  • subscribe the EdgeNodeClusterConfig and DeviceNetworkStatus for
    publishing the EdgeNodeClusterStatus for cluster-init.sh to use
  • implement cluster server status HTTP service for others to join
    the cluster
  • collect stats from kubernetes cluster for node/pod/vmi and
    publish clusterInfo
  • implement cluster leader election for cluster reporter
  • documented the functionalities of zedkube in pillar/doc/zedkube.md
  • some of the changes are not final, and other PRs should follow.

 - check kubernetes pod status and publish ENClusterAppStatus
   for multi-node cluster
 - subscribe the EdgeNodeClusterConfig and DeviceNetworkStatus for
   publishing the EdgeNodeClusterStatus for cluster-init.sh to use
 - implement cluster server status HTTP service for others to join
   the cluster
 - collect stats from kubernetes cluster for node/pod/vmi and
   publish clusterInfo
 - implement cluster leader election for cluster reporter
 - documented the functionalities of zedkube in pillar/doc/zedkube.md

Signed-off-by: Naiming Shen <naiming@zededa.com>
return
}
mux := http.NewServeMux()
mux.HandleFunc("/status", func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intending to only support the GET operation? You should check the operation(s) in the handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated with the check for GET

// Start the server in a goroutine
go func() {
defer ctx.statusServerWG.Done()
if err := ctx.statusServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This listens on all IP addresses. What's the security approach for this open port? (And which port number does it run on? Some documentation seems to be missing ;-)

Copy link
Contributor Author

@naiming-zededa naiming-zededa Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually only listens on the clusterprefix IP address with the cluster status port.
netstat showing this service:

tcp        0      0 10.244.244.2:12346      0.0.0.0:*               LISTEN      4193/zedkube

this is documented in pkg/pillar/doc/zedkube.md in this PR. Updated the above with comments to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this clusterprefix IP address reachable from the outside of the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be reachable from outside of the local network, the router will not know this private ip prefix.

Copy link
Contributor

@milan-zededa milan-zededa Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark Would you be OK with this approach? When this is merged I can open a followup PR with networking changes and the ACL rule allowing access to cluster-ip:12346 added.


### App Ethernet Passthrough

When the application uses passthrough on ethernet ports, zedkube creates a special NAD, Network Attachment Definition, for the direct connection, uses the name 'host-eth1' for example. It creates the NAD to the kubernetes cluster, and the domainmgr will use this NAD when setup the application configure to kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this passthrough all the way to the application VM or just to the launcher Pod and then bridged by kubevirt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually only for the native container, not for VMIs. I'll add this specific thing to the doc here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For VM and container-inside-VM we use this for passthrough: https://kubevirt.io/user-guide/compute/host-devices/
Correct?
Should we document it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can be done also in a future PR)

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to understand the security approach.


### Cluster Status Server

In the Kubernetes cluster mode with multiple HA servers, when it starts up, it needs to join the cluster by specifying the 'bootstrap' server IP address. Even if the IP address is there, sometimes the 'bootstrap' node is still in single-node mode or it has not been converted into the cluster server yet. This will create problem for the joining server, and will later have conflicts with the status and certificates or tokens. To handle this server joining, zedkube is responsible for reporting it's cluster status through HTTP service. Each of the cluster servers will have a HTTP service on the cluster interface with port number '12346' using URL /status. It will report status of 'cluster' if the node has the property of 'master' and 'etcd'. The new joining server node or agent node will not move forward for starting the kubernetes node unless the http query returns 'cluster' status over the cluster network. The 'ClusterStatus' port for the HTTP is explicitly allowed on EVE firewall.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we restricting this to use the cluster network interface and IP addresses using some ACLs? Can we use https using some certificates which k3s might have?

I don't think we can have TCP port number 12346 accessible from the Internet.

Copy link
Contributor Author

@naiming-zededa naiming-zededa Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is just HTTP, using certs only going to verify the server side which does not help. Yes, this is only listening on the cluster IP which is a private prefix setup locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the firewall check is specific to the cluster interface and cluster IP w/ this port:

		clusterStatusRule := iptables.Rule{
			RuleLabel: "Allow access to Cluster Status",
			MatchOpts: []string{"-p", "tcp", "-i", clusterPort.IfName,
				"-d", clusterStatus.ClusterIPPrefix.IP.String(), "--dport", "12346"},
			Target:      "ACCEPT",
			Description: "Allow access to Cluster Status via cluster interface",
		}

 - Add vendor files for Implement Cluster for zedkube related functions

Signed-off-by: Naiming Shen <naiming@zededa.com>
Copy link

codecov bot commented Nov 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.93%. Comparing base (dd27fe1) to head (54ba466).
Report is 23 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4408   +/-   ##
=======================================
  Coverage   20.93%   20.93%           
=======================================
  Files          13       13           
  Lines        2895     2895           
=======================================
  Hits          606      606           
  Misses       2163     2163           
  Partials      126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@milan-zededa
Copy link
Contributor

@eriknordmark If you are OK with the port 12346 being open for (only) cluster-IP, I can fix the conflict in zedkube.md, merge the PR and we could proceed with the followup PR containing network changes.

@eriknordmark
Copy link
Contributor

@milan-zededa I wonder where we should document this somewhere. But we can figure that out later. Is there a device running this PR where we can try a port scan just to make sure it isn't exposed outside of the cluster IP?

@milan-zededa
Copy link
Contributor

@milan-zededa I wonder where we should document this somewhere. But we can figure that out later. Is there a device running this PR where we can try a port scan just to make sure it isn't exposed outside of the cluster IP?

@zedi-pramodh @andrewd-zededa Do you have some cluster deployed running EVE with all your latest changes?
The nodes in canary from Naiming that I have been using are currently in the suspect state.

@andrewd-zededa
Copy link
Contributor

@milan-zededa check either alpha or local please. Both have a cluster running recent code.

@milan-zededa
Copy link
Contributor

I took the liberty of resolving the zedkube.md merge conflicts to help move this forward so that I can submit my last outstanding commit for kubevirt-EVE.
However, DCO and Lint checks do not like it, so I'm thinking of opening a new PR with your commits + my commit...

@milan-zededa
Copy link
Contributor

milan-zededa commented Dec 2, 2024

Closing since this is now superseded by #4454
I have fixed there merge conflicts with the master and added my last outstanding networking changes (that depend on this PR) which are needed for kubevirt-EVE clustering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants