-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor cni telemetry #3149
base: master
Are you sure you want to change the base?
Conversation
23e8b82
to
0613803
Compare
|
||
func (c *TelemetryClient) SendEvent(msg string) { | ||
c.sendLog(msg) | ||
c.sendTelemetry(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between log and telemetry here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs are sent to azure-vnet.log
. Telemetry is sent to the telemetry service on the node which then sends it to application insights.
} | ||
} | ||
|
||
func (c *TelemetryClient) ConnectTelemetry(logger *zap.Logger) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this be part of the constructor? wouldn't that remove the need for nil checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ConnectTelemetry
in NewTelemetryClient
I believe we would try to connect to the telemetry service during tests, which I do not want to occur. Connect / StartAndConnect
are also called by different main methods (stateless vs. non-stateless cni) whereas NewTelemetryClient
is called for both. Adding attempts to only connect to NewTelemetryClient
(via ConnectTelemetry
) would be incorrect behavior in non-stateless cni because we want to start the telemetry service before any connection is attempted.
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
0ea0ad4
to
b956ec4
Compare
we will split this part of the pr into its own pr a telemetry event was added back which was previously removed undo this pr to add those telemetry statements back
b956ec4
to
dd9ca83
Compare
remove reflect remove duplicated telemetry and telemetry buffer remove unused fields in report manager force access to telemetry client fields through methods move telemetry start/connect code closer to start of plugin execution
we use SendError where we would have previously called reportPluginError (no log emitted) we don't set error message in cni report because the error message and event message fields both end up in the Message field in the cni telemetry service
LGTM on @ramiro-gamarra 's approval |
cni/network/network.go
Outdated
@@ -479,18 +436,12 @@ func (plugin *NetPlugin) Add(args *cniSkel.CmdArgs) error { | |||
zap.String("pod", k8sPodName), | |||
zap.Any("IPs", cniResult.IPs), | |||
zap.Error(log.NewErrorWithoutStackTrace(err))) | |||
|
|||
telemetry.SendEvent(fmt.Sprintf("ADD command completed with [ipamAddResult]: %s [epInfos]: %s [error]: %v ", ipamAddResult.PrettyString(), network.FormatStructPointers(epInfos), err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per one of your previous comments, telemetry.SendEvent
will both write to azure-vnet.log
as well as send to telemetry service. Aren't we then writing to the log file twice here if logger
also writes to the file? Can we keep this PR to only a pure refactor without introducing new side effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are correct, I've removed the duplicate logger statement in the telemetry client
cni/network/network.go
Outdated
|
||
defer func() { | ||
logger.Info("DEL command completed", | ||
zap.String("pod", k8sPodName), | ||
zap.Error(log.NewErrorWithoutStackTrace(err))) | ||
telemetry.SendEvent(fmt.Sprintf("DEL command completed: [released ip]: %+v [podname]: %s [namespace]: %s [error]: %v", nwCfg.IPAM.Address, k8sPodName, k8sNamespace, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. if logger
already writes to the file, then we are just duplicating data, as well as going away from structured logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed above-- I was trying to replicate what I was seeing in the cnslogger but since we log before anyway this isn't helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may still be missing some details about the purpose of this refactor, but seems to me that logs are getting duplicated and the abstractions introduced are not cleaning up the code much yet.
telemetryWaitTimeInMilliseconds = 200 | ||
) | ||
|
||
type TelemetryClient struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of stutter in the naming. Can't there just be a telemetry
package with a Client
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the telemetry client to the existing telemetry package
lock sync.Mutex | ||
} | ||
|
||
type TelemetryInterface interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only a single implementation of this interface. This is not needed.
cniReportSettings *telemetry.CNIReport | ||
tb *telemetry.TelemetryBuffer | ||
logger *zap.Logger | ||
lock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is CNI spinning goroutines? What is this lock serializing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we don't send telemetry from multiple goroutines as far as I can tell, but if we want to send telemetry from anywhere in the codebase in the future, I would prefer to serialize this and not worry about races etc. later, but I can remove the lock for now if you would like.
Reason for Change:
Currently the telemetry CNI is sending is insufficient to debug CNI issues. This PR refactors the cni telemetry to send more and better quality logs.
Examples of Logged information (Will be added in a separate PR-- this PR is focused on refactoring)
Potential additions:
Issue Fixed:
Requirements:
Notes:
Pipeline run to prove logs sent to kusto: https://msazure.visualstudio.com/One/_build/results?buildId=108208651&view=results
Passing run: https://msazure.visualstudio.com/One/_build/results?buildId=108563465&view=results