-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support pod exec terminal logging #9385
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9385 +/- ##
==========================================
- Coverage 45.78% 45.76% -0.02%
==========================================
Files 220 220
Lines 26165 26186 +21
==========================================
+ Hits 11979 11985 +6
- Misses 12529 12544 +15
Partials 1657 1657
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a really good idea! Added a couple thoughts.
@crenshaw-dev - The security scan flagged the new log lines with:
Since we are sending the url parameters verbatim to logs and the kubernetes API it is unsafe to write them to logs. The |
Signed-off-by: smcavallo <smcavallo@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR. Tks for improving the logs. Added one suggestion.
break | ||
} | ||
} | ||
if !findContainer { | ||
if foundContainerName == "" { | ||
http.Error(w, "Cannot find container", http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For auditing it would be interesting to also have this logged (as a warning) with all info: cluster, namespace, pod name and container name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL might complain about logging the un-sanitized container
input. But since we're validating the container name above, I think it would be safe to log (and to override the CodeQL warning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz - yes this is the issue we were trying to avoid - Log entries created from user input High This log write receives unsanitized user input from here.
It is unsafe to log verbatim whatever was posted to URL params which is why it is not logged.
In theory it should rarely happen as most requests should come directly from the argocd application itself and only post namespace + pod + container as already found in argocd. I agree it would be useful to be able to debug though. If a user really wants to know why it can't be found wondering if there is wireshark/tcpdump some level of capturing the http request instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unsafe to log verbatim whatever was posted to URL params which is why it is not logged.
@smcavallo but you are previously sanitizing isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sanitizing, but validating. Which I think should be enough.
If we're reaching this point of the code, then we've shown the user is authenticated and authorized to get
the application and create
on the exec
resource. I'm not too worried about this user filling up the disk with this log line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz and @crenshaw-dev - totally make sense - we've already validated so it's OK to log these. I have added the additional info to these logs.
Signed-off-by: smcavallo <smcavallo@hotmail.com>
Signed-off-by: smcavallo <smcavallo@hotmail.com>
server/application/terminal.go
Outdated
@@ -150,7 +153,8 @@ func (s *terminalHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
|
|||
pod, err := kubeClientset.CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{}) | |||
if err != nil { | |||
http.Error(w, "Cannot find pod: "+podName, http.StatusBadRequest) | |||
fieldLog.Warn("Terminal Pod Not Found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error and should be logged like so:
fieldLog.Errorf("error retrieving pod %s: %s", podName, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz - thank you for checking again - we are already logging the podName. when fieldLog
is called it will output ALL the fields (user, namespace, pod, container) to the log line. since it is using WithFields
-
Note that it doesn't log until you call Debug, Print, Info, Warn, Fatal or Panic on the Entry it returns.
I left as"Terminal Pod Not Found"
to improve the searchability and to create standardized alerts based on that key - we can assume the format of that structured log will stay the same and build alerts based on that key.
I defer to all the argocd folks but the distinction here which is important is user error vs application error.
It makes sense that this throws an http error to the user.
However it is not actually an application error - the application is performing normally.
From my perspective it would be better to be Warn
instead of Error
for application log level.
As an operator I would want to know that this is happening (Warn
) but I would not consider this an indicator that argocd is unhealthy, having an issue, or doing something broken or unexpected.
Let me know if I should change it or if the above doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the fact that we don't need the pod name as it is already part of registered fields. However I believe this still needs to be logged as an error. It is ArgoCD internal code that does the call to kube-api to retrieve the pod by its name. If this request fails for some reason it is an internal error and should be logged like so. Retrieving the pod call can fail by several reasons and logging it as terminal pod not found
is misleading.
I suggest to change this log to:
fieldLog.Errorf("error retrieving pod: %s", err)
By the way, log messages should be all lowercase sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz what if a Pod is in the resource tree but not synced? In that case an error from the k8s API would be a user error rather than an application error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz - good point - I have updated it and also changed the log messages to all lowercase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leoluz what if a Pod is in the resource tree but not synced? In that case an error from the k8s API would be a user error rather than an application error.
@crenshaw-dev My understanding is that (contrary to HTTP return codes) from the log perspective, it doesn't matter if the error was caused by the user or by something internal. Our code invoked a method that returned an error and in this case an error should be logged. However there is another "subtle" problem in this case: client-go returns an error if for example the resource isn't found. This isn't an exceptional scenario and if we want to be precise to whether or not log an error (and decide the proper http return code) we need to do an extra check:
if err != nil {
if apimachineryerrors.IsNotFound(err) {
// don't log error
// http 404
} else {
// log error
// http 5xx
}
}
Signed-off-by: smcavallo <smcavallo@hotmail.com>
Signed-off-by: smcavallo <smcavallo@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* feat: support pod exec terminal logging Signed-off-by: smcavallo <smcavallo@hotmail.com> * enhanced validation and logging when resource not found Signed-off-by: smcavallo <smcavallo@hotmail.com> * fix lint Signed-off-by: smcavallo <smcavallo@hotmail.com> * log warning when pod or container not found Signed-off-by: smcavallo <smcavallo@hotmail.com> * go/log-injection fixes Signed-off-by: smcavallo <smcavallo@hotmail.com> * log levels and lowercase message Signed-off-by: smcavallo <smcavallo@hotmail.com>
Cherry-picked to 2.4. |
Signed-off-by: smcavallo smcavallo@hotmail.com
See - #8905
The exec feature is extremely powerful but lacks auditability.
Many orgs will require some auditing and history tracking of who is exec'ing into pods and containers.
This "feature" logs access to the Terminal/Exec feature.
When a terminal session is opened it will generate a log line.
This seemed to be the best place to add this logging and should have enough info for auditing purposes.
Note on DCO:
If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.
Checklist: