Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubevirt zedbox service zedkube #3827

Merged
merged 1 commit into from
Apr 25, 2024

Conversation

naiming-zededa
Copy link
Contributor

  • a zedbox service for edge-node kubevirt
  • currently handles pod's functions
    • container direct-connected ethernet
    • remote console assess with VNC for VMI
    • App log from the pods
    • may add more functionalities in Edge-node clustering

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 17.51%. Comparing base (e4f2710) to head (9826f96).
Report is 66 commits behind head on master.

❗ Current head 9826f96 differs from pull request most recent head 84c7fb3. Consider uploading reports for the commit 84c7fb3 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3827   +/-   ##
=======================================
  Coverage   17.51%   17.51%           
=======================================
  Files           3        3           
  Lines         805      805           
=======================================
  Hits          141      141           
  Misses        629      629           
  Partials       35       35           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@naiming-zededa naiming-zededa force-pushed the naiming-zedkube branch 2 times, most recently from 7547502 to 0012210 Compare March 20, 2024 05:10
@@ -19,6 +19,9 @@ DPCDIR=$ZTMPDIR/DevicePortConfig
FIRSTBOOTFILE=$ZTMPDIR/first-boot
FIRSTBOOT=
AGENTS="diag zedagent ledmanager nim nodeagent domainmgr loguploader tpmmgr vaultmgr zedmanager zedrouter downloader verifier baseosmgr wstunnelclient volumemgr watcher zfsmanager usbmanager"
if grep -q kubevirt /run/eve-hv-type; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just add zedkube, and the zedkube service itself would recognize, "oh no, I do not run under kubevirt, so I will do the touch and exit"?

I am concerned with polluting this with more specific logic. Everything else in device-steps.sh is generic. We have so many if/then and properties all over the place, keeping this clean would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do that in pkg/pillar/cmd/zedkube/nokube.go. But note that we would have to update touch file periodically and could not just exit (otherwise watchdog will get triggered). Or we do this, but then we just have another if/then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about that. It isn't just the wait_for_touch, but the ongoing watchdog, isn't it? What happens with diag and watchdog?

Or we do this, but then we just have another if/then

Better in one place than scattered about. The best solution would be to update this generically, so that we have some way of knowing without if/then scattered everywhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, below wait_for_touch there is touch "$WATCHDOG_FILE/$AGENT.touch", which enables periodic watchdog monitoring.
I actually do not know why we do not enable watchdog for diag. Probably because it is not a critical microservice so we do not care if it will get stuck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that makes it different than zedkube, which we do care about in watchdog.

I know it is "just one more if/then", but all of our areas across eve where we have too many of them and too much complexity started that way. It would be good to have a cleaner solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//go:build !kubevirt

This is the file that is built when not kubevirt variant? So this will turn the service into a perpetual loop?

Does this mean that all services always start and run. However, their behavior may vary:

  • kubevirt: normal service running
  • non-kubevirt: just an infinite loop, so that the service runs, but nothing happens

So from device-steps, it just looks like, "I have a list of services, I start them all, I run them all, I watch them all."

And since these all are threads, there is no overhead from launching an additional process with go runtime overhead, right?

That's quite nice and elegant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question though is if we need to introduce a new microservice in the first place.
We are already enhancing the existing microservices, like zedrouter, volumemgr, zedmanager and domainmgr, to handle kubevirt-specific workflow. So why to put these few things into a separate new microservice? Will there be more applications for zedkube in the cluster scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a much better question than I asked. Do we need an additional µservice? I worked under the assumption that we do. If we do, then this approach is good. But best to answer Milan's question first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since these all are threads, there is no overhead from launching an additional process with go runtime overhead, right?

The only overhead is the thread running and touching the file about every 15 seconds.

Copy link
Contributor Author

@naiming-zededa naiming-zededa Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question though is if we need to introduce a new microservice in the first place. We are already enhancing the existing microservices, like zedrouter, volumemgr, zedmanager and domainmgr, to handle kubevirt-specific workflow. So why to put these few things into a separate new microservice? Will there be more applications for zedkube in the cluster scenario?

There should be more things to come for the new service in cluster, we'll have it the publish like EdgeNodeClusterStatus, etc. Even the provision of cluster prefix can be done in that. Among the nodes in cluster we'll elect a leader to be the reporter to report k3s cluster stats and info to the controller, etc. and there may be a lot more interesting things it needs to do with kubernetes

Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment about device-steps.sh

Beyond that, mostly looks good.

I don't understand the vnc part. Why do we need remote vnc access to enable zedkube?

nadname := "host-" + io.Name
_, ok := ctx.networkInstanceStatusMap.Load(nadname)
if !ok {
bringupInterface(io.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail to prepare interface for assignment, shouldn't we propagate the error to AppInstanceStatus (and potentially stop the application from being deployed)? Seems we only log errors here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@naiming-zededa naiming-zededa Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is how to bubble up this error into AppInstanceStatus, when we launch the pod in domainmgr, it will get an error there, since the net-attach-def of this does not exist. We'll get the error to the user this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the error message will be something else and not useful for troubleshooting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can let zedkube in the error condition to publish the IOAdapter status error. and zedmanager listening on this, and it can include this into the AppInatnceStatus error portion. Does this sound reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I don't need to do that. I can do this in the domainmgr, during the pod configure setup stage, query for the net-attach-def for this ether port, if kubernetes side does not have it, it can generate this specific error condition for the UI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the domainmgr side the of kubevirt patch has not been made into the mainline, we'll change to add that error condition for checking the io-ethernet from k3s and express error over there.

@naiming-zededa
Copy link
Contributor Author

I left one comment about device-steps.sh

Beyond that, mostly looks good.

I don't understand the vnc part. Why do we need remote vnc access to enable zedkube?

@deitch since we use the kubevirt to launch the VMI, the way to access the VNC console is to use the 'virtctl vnc' tool to let the VMI to pass over the VNC port number over to us, thus the regular 'remote-console' from the controller will work as usual. since this 'extra' thing is only related to kubevirt type of operation, and 'zedkube' is for that currently. That is the only reason this 'vnc.go' is in zedkube side.

@deitch
Copy link
Contributor

deitch commented Mar 26, 2024

So this is for enabling remote console in kubevirt eve? Got it.

@naiming-zededa
Copy link
Contributor Author

So this is for enabling remote console in kubevirt eve? Got it.

yes, allow remote-console into the VMI

@@ -0,0 +1,187 @@
// Copyright (c) 2024 Zededa, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the file name aitoapiserver.go mean (in particular the aito part)?
What about creating two separate go files, such as ethpassthrough.go and applogs.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@naiming-zededa
Copy link
Contributor Author

naiming-zededa commented Apr 1, 2024

updated the PR:

  • have the 'still-running' touch in nokube.go for the dummy Run(), remove condition in device-steps.sh
  • split the aitoapiserver.go into two files as suggested
  • will have check in domainmgr side for verifying ether-passthrough NAD exist before launch app( not in this PR)
    code such as:
			// Check if the NAD is created in the cluster, return error if not
			err := kubeapi.CheckEtherPassThroughNAD(nadName)
			if err != nil {
				logrus.Errorf("CreatePodConfig: check ether NAD failed, %v", err)
				return err
			}
  • modified the nokube.go to remove 'select' due to yetus warning.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An old comment from me which was stuck in a pending review :-(

esac
done < "$VMICONFIG_FILENAME"

# Check if vminame and vncport were found and assign default values if not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't match the code which returns an error

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the comment nit.

bringupInterface(io.Name)
err := ioEtherCreate(ctx, &io)
if err != nil {
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err)
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err)
continue

if it is an error, why store the nadname?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in here, there is a case the NAD already exist on the kubernetes side, we want to count this as created but log an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in another outstanding PR #3841, we have added the function 'func CheckEtherPassThroughNAD(nadName string)', we can later to add a check to see if it already exists condition if not, then do the 'continue' as suggested.

Comment on lines 94 to 95
if scanner.Err() != nil {
if scanner.Err() == io.EOF {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if scanner.Err() != nil {
if scanner.Err() == io.EOF {
if scanner.Err() == io.EOF {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't it also log error and break when there is an error? (It currently doesn't do that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add an error here.

Signed-off-by: Naiming Shen <naiming@Admins-MacBook-Pro-3.local>
Signed-off-by: Naiming Shen <naiming@admins-mbp-3.lan>
Signed-off-by: Naiming Shen <naiming@Admins-MacBook-Pro-3.local>
Signed-off-by: Naiming Shen <naiming@admins-mbp-3.lan>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit 74e3d47 into lf-edge:master Apr 25, 2024
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants