-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubevirt zedbox service zedkube #3827
Conversation
naiming-zededa
commented
Mar 20, 2024
- a zedbox service for edge-node kubevirt
- currently handles pod's functions
- container direct-connected ethernet
- remote console assess with VNC for VMI
- App log from the pods
- may add more functionalities in Edge-node clustering
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3827 +/- ##
=======================================
Coverage 17.51% 17.51%
=======================================
Files 3 3
Lines 805 805
=======================================
Hits 141 141
Misses 629 629
Partials 35 35 ☔ View full report in Codecov by Sentry. |
7547502
to
0012210
Compare
0012210
to
0226a39
Compare
pkg/pillar/scripts/device-steps.sh
Outdated
@@ -19,6 +19,9 @@ DPCDIR=$ZTMPDIR/DevicePortConfig | |||
FIRSTBOOTFILE=$ZTMPDIR/first-boot | |||
FIRSTBOOT= | |||
AGENTS="diag zedagent ledmanager nim nodeagent domainmgr loguploader tpmmgr vaultmgr zedmanager zedrouter downloader verifier baseosmgr wstunnelclient volumemgr watcher zfsmanager usbmanager" | |||
if grep -q kubevirt /run/eve-hv-type; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just add zedkube
, and the zedkube service itself would recognize, "oh no, I do not run under kubevirt, so I will do the touch
and exit"?
I am concerned with polluting this with more specific logic. Everything else in device-steps.sh
is generic. We have so many if/then and properties all over the place, keeping this clean would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do that in pkg/pillar/cmd/zedkube/nokube.go
. But note that we would have to update touch file periodically and could not just exit (otherwise watchdog will get triggered). Or we do this, but then we just have another if/then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about that. It isn't just the wait_for_touch
, but the ongoing watchdog, isn't it? What happens with diag
and watchdog?
Or we do this, but then we just have another if/then
Better in one place than scattered about. The best solution would be to update this generically, so that we have some way of knowing without if/then scattered everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, below wait_for_touch
there is touch "$WATCHDOG_FILE/$AGENT.touch"
, which enables periodic watchdog monitoring.
I actually do not know why we do not enable watchdog for diag
. Probably because it is not a critical microservice so we do not care if it will get stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that makes it different than zedkube
, which we do care about in watchdog.
I know it is "just one more if/then", but all of our areas across eve where we have too many of them and too much complexity started that way. It would be good to have a cleaner solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//go:build !kubevirt
This is the file that is built when not kubevirt variant? So this will turn the service into a perpetual loop?
Does this mean that all services always start and run. However, their behavior may vary:
- kubevirt: normal service running
- non-kubevirt: just an infinite loop, so that the service runs, but nothing happens
So from device-steps, it just looks like, "I have a list of services, I start them all, I run them all, I watch them all."
And since these all are threads, there is no overhead from launching an additional process with go runtime overhead, right?
That's quite nice and elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question though is if we need to introduce a new microservice in the first place.
We are already enhancing the existing microservices, like zedrouter, volumemgr, zedmanager and domainmgr, to handle kubevirt-specific workflow. So why to put these few things into a separate new microservice? Will there be more applications for zedkube in the cluster scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a much better question than I asked. Do we need an additional µservice? I worked under the assumption that we do. If we do, then this approach is good. But best to answer Milan's question first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since these all are threads, there is no overhead from launching an additional process with go runtime overhead, right?
The only overhead is the thread running and touching the file about every 15 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question though is if we need to introduce a new microservice in the first place. We are already enhancing the existing microservices, like zedrouter, volumemgr, zedmanager and domainmgr, to handle kubevirt-specific workflow. So why to put these few things into a separate new microservice? Will there be more applications for zedkube in the cluster scenario?
There should be more things to come for the new service in cluster, we'll have it the publish like EdgeNodeClusterStatus, etc. Even the provision of cluster prefix can be done in that. Among the nodes in cluster we'll elect a leader to be the reporter to report k3s cluster stats and info to the controller, etc. and there may be a lot more interesting things it needs to do with kubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about device-steps.sh
Beyond that, mostly looks good.
I don't understand the vnc part. Why do we need remote vnc access to enable zedkube?
nadname := "host-" + io.Name | ||
_, ok := ctx.networkInstanceStatusMap.Load(nadname) | ||
if !ok { | ||
bringupInterface(io.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fail to prepare interface for assignment, shouldn't we propagate the error to AppInstanceStatus (and potentially stop the application from being deployed)? Seems we only log errors here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is how to bubble up this error into AppInstanceStatus, when we launch the pod in domainmgr, it will get an error there, since the net-attach-def of this does not exist. We'll get the error to the user this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the error message will be something else and not useful for troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can let zedkube in the error condition to publish the IOAdapter status error. and zedmanager listening on this, and it can include this into the AppInatnceStatus error portion. Does this sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I don't need to do that. I can do this in the domainmgr, during the pod configure setup stage, query for the net-attach-def for this ether port, if kubernetes side does not have it, it can generate this specific error condition for the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the domainmgr side the of kubevirt patch has not been made into the mainline, we'll change to add that error condition for checking the io-ethernet from k3s and express error over there.
@deitch since we use the kubevirt to launch the VMI, the way to access the VNC console is to use the 'virtctl vnc' tool to let the VMI to pass over the VNC port number over to us, thus the regular 'remote-console' from the controller will work as usual. since this 'extra' thing is only related to kubevirt type of operation, and 'zedkube' is for that currently. That is the only reason this 'vnc.go' is in zedkube side. |
So this is for enabling remote console in kubevirt eve? Got it. |
yes, allow remote-console into the VMI |
@@ -0,0 +1,187 @@ | |||
// Copyright (c) 2024 Zededa, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the file name aitoapiserver.go
mean (in particular the aito
part)?
What about creating two separate go files, such as ethpassthrough.go
and applogs.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
0226a39
to
f6ec861
Compare
updated the PR:
|
f6ec861
to
9826f96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An old comment from me which was stuck in a pending review :-(
pkg/kube/cluster-init.sh
Outdated
esac | ||
done < "$VMICONFIG_FILENAME" | ||
|
||
# Check if vminame and vncport were found and assign default values if not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't match the code which returns an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the comment nit.
bringupInterface(io.Name) | ||
err := ioEtherCreate(ctx, &io) | ||
if err != nil { | ||
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err) | |
log.Errorf("checkIoAdapterEthernet: create io adapter error %v", err) | |
continue |
if it is an error, why store the nadname
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in here, there is a case the NAD already exist on the kubernetes side, we want to count this as created but log an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in another outstanding PR #3841, we have added the function 'func CheckEtherPassThroughNAD(nadName string)', we can later to add a check to see if it already exists condition if not, then do the 'continue' as suggested.
pkg/pillar/cmd/zedkube/applogs.go
Outdated
if scanner.Err() != nil { | ||
if scanner.Err() == io.EOF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if scanner.Err() != nil { | |
if scanner.Err() == io.EOF { | |
if scanner.Err() == io.EOF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, shouldn't it also log error and break when there is an error? (It currently doesn't do that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add an error here.
Signed-off-by: Naiming Shen <naiming@Admins-MacBook-Pro-3.local> Signed-off-by: Naiming Shen <naiming@admins-mbp-3.lan> Signed-off-by: Naiming Shen <naiming@Admins-MacBook-Pro-3.local> Signed-off-by: Naiming Shen <naiming@admins-mbp-3.lan>
9826f96
to
84c7fb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM