-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured (Json) logging for Tf Processes #32
Comments
* See https://github.com/jlewi/mlkube.io/issues/32 * Add sitecustomize.py with a custom logger for formatting the log entries as json.
So I did some prototyping (see #33) of a custom Python logger which outputs json formatted log entries. This approach turns out to have a problem because log messages coming from TensorFlow c++ code don't go through the python logger. TensorFlow's LogMessage::GenerateLogMessage just writes to stderr. It doesn't look like TensorFlow offers any mechanism to intercept or configure these log messages other than processing stderr. So I think Option 2 might be the better option. A better option might be to create a custom Docker Logging Plugin. Unfortunately, I don't think K8s currently supports customizing the Docker logging driver (see issue 17183). I think the advantage of a Docker plugin is that we could intercept stdout/stderr without using a sidecar. |
IIUC, The requirements are - using I think then option 2 is fine, as it's not uncommon to have a streaming sidecar container to intercept and annotate logs. OTOH, in the longer term, maybe an issue should be filed against TF to allow all of its logging can be controlled centrally, or have specific controls for the C++ logging. |
Thanks @foxish. I think it would be reasonable to modify TF logging, but what about every other c++ extension/library that code might invoke? So IMO relying on c++ extensions to allow logging to be controlled via environment variables or from python seems destined to always be an incomplete solution. Arguably stdout/stderr already define a well established convention for exporting logs so that seems like a better starting point if the goal is a complete solution. My main gripe with the side car solution is that it seems like it could be better supported by K8s. For example, if I want to make stdout and stderr available to the sidecar the best I could come up with so far is to redirect the streams by appending " > /volume/stdout 2> /volume/stderr". But this seems like it could lead to weird edge cases with various ways of specifying the command to run in the container. |
@wbuchwalter Do you want to weigh in? I'd appreciate hearing your opinion. There's now some prototype code in #33 of a logger running in a sidecar. |
Could we manage this using k8s REST API instead? We would still need to work around the multi-line issue though, but this would avoid needing users to append What do you think? |
That's a great idea! So the logger can use the ReadLog API to fetch the logs. We can also fetch the labels from the POD rather than passing them in via command line. |
I'd recommend not requiring access to k8s APIs for security and scalability reasons. This is a common problem because our current logging pipeline doesn't really support structured logging. Let's see if k8s offers any generic solutions for this problem. cc @crassirostris to see if he has any idea here. In general, we recommend sidecars in k8s world since it decouples the main app from the actual monitoring/logging solutions and can be injected via policies. |
I agree with @vishh, relying on sending all logging traffic through the apiserver is not a great idea If you're not able to use json logs, I would suggest rolling with the sidecar approach. Logging is complex: the default Stackdriver Logging integration will handle metadata, rotation, buffering on the node side and so on for you. It's possible that in the future the integration with Stackdriver Logging will allow to configure the format for each container, but that's far from today. |
@vishh @crassirostris thanks for the pointers that makes sense. I think the plan involves using a sidecar either way. I think the main question is how to redirect logs to the sidecar. We want to avoid the ugliness that @wbuchwalter mentioned of forcing users to redirect stdout/stderr. Can we just volume mount the container logs produced by Docker and process those log files directly? I assume this also raises security concerns since I'm guessing the sidecar would end up with access to logs from all containers. It looks like the docs all use a pod volume and explicitly redirect stdout/stderr to the pod volume. Would it make sense to file a K8s feature request to provide a mechanism for directing logs to a side ca? |
Using EmptyDir volumes to share logs between TF trainer and side cars is
probably a good starting point.
…On Thu, Sep 7, 2017 at 1:11 PM, Jeremy Lewi ***@***.***> wrote:
@vishh <https://github.com/vishh> @crassirostris
<https://github.com/crassirostris> thanks for the pointers that makes
sense.
I think the plan involves using a sidecar either way. I think the main
question is how to redirect logs to the sidecar. We want to avoid the
ugliness that @wbuchwalter <https://github.com/wbuchwalter> mentioned of
forcing users to redirect stdout/stderr.
Can we just volume mount the container logs produced by Docker and process
those log files directly? I assume this also raises security concerns since
I'm guessing the sidecar would end up with access to logs from all
containers.
It looks like the docs
<https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures>
all use a pod volume and explicitly redirect stdout/stderr to the pod
volume.
Would it make sense to file a K8s feature request to provide a mechanism
for directing logs to a side ca?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/jlewi/mlkube.io/issues/32#issuecomment-327914546>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKGCCAFMbcW0orQ62MsSUi0dU8mf8ks5sgE4MgaJpZM4PLSmn>
.
|
Yes, you can setup your own logging agent, e.g. fluentd, then get container metadata using e.g. https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter and then apply a special multiline format for container you're interested in, and drop the rest. Or implement your own plugin to get the metadata from the apiserver and only process logs which belong to the containers you're interested in (though I'd still highly recommend reformatting the streams and sending it to another container's stdout/stderr)
That might be a piece of useful input. There's no clear understanding what logging is going to look like in the future as of now /cc @piosz @fgrzadkowski |
Thanks. I found another relevant PR to support letting users specify the logger plugin which I think is ultimately what we'd want to do. I think for now we'll do the following based on previous suggestions
I think this approach is in line with previous suggestions
I think the main disadvantages are
|
This is not a PR, and node team is reluctant to let that happen, because this is runtime-specific.
Note that /var/log/containers is deprecated and will be probably removed in one of the future releases. The correct way of accessing logs is /var/log/pods, but it requires to know the pod id. I suggest to implement it little bit differently: have an init container, that will create a pipe in a shared volume, then redirect original process stdio to a pipe (or two separate files) and add a sidecar that reads from this pipe and writes to its stdio. And finally, make all this added automatically by an external admissions controller (available in 1.8 AFAIR) or podpreset (available already, but not sure if it's possible to implement all this using podpresets), so that user doesn't have to do anything |
@crassirostris thank you for explaining node team's view. Attaching labels to logs (and also to metrics like CPU traces) seems like it would be a common request. Are there any issues worth paying attention to? Any suggestion about how to get the POD id? It doesn't look like I can set an environment variable like I can for POD NAME. I tried
but this gave me an error
So the best I could come up with was to use the deprecated /var/log/containers. |
It's already available via kubernetes_metadata_filter. Making them available out of the box in GKE setup is in progress, but I'm not sure there's an open-source issue
It should be available: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information#capabilities-of-the-downward-api |
How does kubernetes_metadata_filter help here? It doesn't seem like that plugin attaches pod labels; presumably because it has no way to get the pod labels. I could run fluentd in the sidecar and modify the fluentd config on container start to fetch the pod labels and attach them to each entry. But that has the significant downside that the sidecar is now determining what sink to use for the logs. I don't think I want to do that; I'd prefer to let the sink be configured at the cluster level. |
It does, look at the example. It goes to the K8s apiserver to get the data, so you should have a RBAC policy for fluentd, allowing to do that. Edit:
And this plugin is needed to avoid exactly that :) |
Thanks. This is great this should be exactly what we need. |
Closing this issue. Per the discussion above. The expectation is that K8s will support attaching pod labels to container logs. |
We'd like to support outputting structured (json) log entries from TensorFlow processes. These json log entries should contain metadata in the form of key-value pairs which provide information such as
The goal is to allow K8s clusters to be configured to use a backend for logging (e.g. StackDriver logging on GKE) that can index these labels to facilitate searching for logs from a particular, job and replica.
Option 1: Python Logger
We us a Python logger that can emit json formatted logs and attach appropriate labels.
The TfJob operator passes the necessary metadata to the Python logger (e.g. via environment variables)
We make it easy for users to automatically use this custom Json logger
Option 2: Sidecar
Plan
I think option 1 is better. Using a sidecar, we'd have to parse the log entries which requires knowing how they are formatted and this leads to various edge cases when dealing with things like multi-line log entries. I think using Python's logging infrastructure to format log entries as needed is cleaner.
option 2 would also change how a user uses kubectl logs to get the logs since they would now have to get the logs of the sidecar and not the tensorflow container.
The text was updated successfully, but these errors were encountered: