Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-term story for /var/log support #818

Closed
yanweiguo opened this issue May 7, 2018 · 21 comments · Fixed by #4156
Closed

Long-term story for /var/log support #818

yanweiguo opened this issue May 7, 2018 · 21 comments · Fixed by #4156
Assignees
Labels
area/monitoring kind/feature Well-understood/specified features, ready for coding. P0 P0
Milestone

Comments

@yanweiguo
Copy link
Contributor

Expected Behavior

Collect logs from /var/log using the same DaemonSet which collects stderr/stdout logs.

Actual Behavior

/var/log is collect by a sidecar container per pod, which is resource heavy.

Additional Info

From @dprotaso: given that container volumes have a deterministic layout on the host
/var/lib/kubelet/pods/<pod uid>/volumes/<volume type>/<volume name>

We could get rid of the fluentd sidecar by modifying/forking the fluentd k8s metadata filter plugin. Then we could have fluentd source logs from /var/lib/kubelet/pods/*/volumes/emptyDir/ela-varlog-volume/**/*.* (volume name being created/controlled by the revision controller).

@google-prow-robot google-prow-robot added area/monitoring kind/feature Well-understood/specified features, ready for coding. labels May 7, 2018
@mdemirhan mdemirhan added this to the M5 milestone May 8, 2018
@yanweiguo
Copy link
Contributor Author

The host path based on current configuration is: /var/lib/kubelet/pods/<pod-id>/volumes/kubernetes.io~empty-dir/varlog. Verified that existing fluentd daemonset has permission to read.

@dprotaso
Copy link
Member

re-posting from slack for context

Right now pod information cannot be obtained using the pod uid (see kubernetes/kubernetes#20572)

The kubernetes API requires the namespace and pod name.

One workaround is to have the namespace name included in the volume. You still need the pod name if you don't want to fetch all the pods. We can get the container name by explicitly putting it in the volume path (not the subpath).

Two approaches:

  1. Add another downward api volume to the pod that will contain the namespace & pod name in addition to other labels/annotations. Then modify the plugin to read this specific volume when there's a cache miss on the pod id.
  2. Wait for this PR which adds downward api support to subpaths. This should land as alpha 1.11 but will still require us to put the namespace in the volume name.

The first solution would have the side effect of not needing to hit the k8s api to get the pod metadata (need to verify). Both these solutions don't have the container's id. As it's not supported in the downward API. @mdemirhan @yanweiguo @bsnchan I'm curious if the container id is used for any dashboards at the moment.

We could open a k8s issue to get the container id as an exposed attribute that's available via the downward API.

@yanweiguo
Copy link
Contributor Author

We don't use the container id in any dashboards currently. The container id in the plugin is used for cache.

In second solution, I think we can generate container id with the value of pod id or pod name + namespace in fluent tag to make the cache work.

IMO, the second solution would be less complicated.

@yanweiguo
Copy link
Contributor Author

yanweiguo commented Oct 24, 2018

An alternative method is to use a sidecar to tail files under /var/log/ to stdout. This is much cheaper than fluentd sidecar in terms of resource.

The difficultly is that the file paths are arbitrary. We need a way to detect that a new file is created then start tailing it and stop tailing when it is deleted. There may be some lib supporting this.

@liu-cong
Copy link
Contributor

Now that the VolumeSubpathEnvExpansion feature is available as an Alpha feature, I think adding namespace and pod name in the subpath is the preferred option.

Please comment on this and I can create a PR.

@dprotaso
Copy link
Member

dprotaso commented Feb 14, 2019 via email

@sebgoa
Copy link
Contributor

sebgoa commented Apr 24, 2019

fwiw, I do agree with the original comment that a fluentd sidecar is resource heavy. Sidecars are great, but I am really concerned they will affect the scalability and performance.

So +1 on using a daemonset.

@mdemirhan
Copy link
Contributor

@dprotaso @yanweiguo @bsnchan Should I remove your names from the assignees field and let someone else pick this work up?

@mattmoor
Copy link
Member

What version of Kubernetes introduces the feature we need as Beta?

@yanweiguo
Copy link
Contributor Author

What version of Kubernetes introduces the feature we need as Beta?

We need VolumeSubpathEnvExpansion feature. Still alpha in 1.14.

@dprotaso
Copy link
Member

dprotaso commented Apr 26, 2019 via email

@mattmoor
Copy link
Member

mattmoor commented May 2, 2019

1.14 alpha is way too far out.

@mdemirhan suggested that we just volume mount /var/log into a sidecar that we ship (lightweight) and stream the logs to the sidecar's STDOUT (replacing the huge sidecar we have today).

We could also potentially mount /dev/stdout or /dev/stderr into the sidecar (from the user-container) and write the stuff we see hit /var/log to user-container's STDOUT.

@mattmoor mattmoor changed the title Use fluentd DaemonSet to collect /var/log instead of fluentd sidecar Long-term story for /var/log support May 4, 2019
@mattmoor
Copy link
Member

mattmoor commented May 4, 2019

/assign @JRBANCEL

Talked to @mdemirhan about JR picking this up in 0.7.

@mdemirhan
Copy link
Contributor

An advantage of the lightweight sidecar doing tail and writing to stdout is that these logs will also be available in kubectl logs.

@JRBANCEL
Copy link
Contributor

JRBANCEL commented May 17, 2019

We could also potentially mount /dev/stdout or /dev/stderr into the sidecar (from the user-container) and write the stuff we see hit /var/log to user-container's STDOUT.

As discussed offline, this is not possible.
/dev/stdout is a symlink to /proc/self/fd/1. /proc/self is virtual and is different per process.

The same goal can be achieved by mounting the host /proc (to e.g. /hostproc) and the user-container /var/log into the sidecar.
The sidecar can figure out the PID of the user-container through /hostproc (not trivial though), and the sidecar can write to /hostproc/<PID_USER_CONTAINER_PROCESS>/fd/1 which is effectively user-container's stdout.

I don't think it is a great approach anyway:

  • it relies on implementation details (Linux, procfs), not on k8s concepts
  • concurrent appends to a file work but they can be fragmented -> corrupted logs
  • merging stdout and /var/log logs can be confusing

To recap:

Solution 1 - Tailing Sidecar

Mounting user-container's /var/log into the sidecar and have it tail and watch new files seems cleaner.
This can (should?) be done outside of Knative, it is a pretty generic feature. It might even already exist.

  • (+) no work on fluentd side & independent of log collection mechanism
  • (+) kubectl logs works
  • (-) need to implement watching/tailing logic if it doesn't already exist
  • (-) two copies (/var/log -> tail -> stdout -> docker -> /var/log/containers)

Solution 2 - Fluentd reads from volume directly

subPathExpr support is not required, fluentd relies on Ruby's Dir.glob for path expansion so/var/lib/kubelet/pods/**/volumes/kubernetes.io~empty-dir/varlog/**/* is enough, I verified it.

To avoid collisions (i.e. non-Knative pods with a volume named varlog) and differentiate between Knative services that have log collection enabled or not, we could name the volume knative-varlog-ignore and knative-varlog-collect.

@JRBANCEL
Copy link
Contributor

JRBANCEL commented May 17, 2019

@mattmoor, @yanweiguo, @mdemirhan please provide feedback.

Going with solution 2, here is the plan:

  • mount host /var/lib/kubelet/pods to fluentd-ds daemon set
  • remove fluentd sidecar injection logic from reconciler
  • add logic to dynamically name the emptyDir volume mounted to /var/log in user-container knative-varlog-ignore if EnableVarLogCollection is false and knative-varlog-collect if EnableVarLogCollection is true
  • configure fluentd-ds to capture /var/lib/kubelet/pods/**/volumes/kubernetes.io~empty-dir/knative-varlog-collect/**/*
  • add an e2e test that deploys a Knative service with collection enabled/disabled and validate that the logs are/aren't being collected (might be out of scope for this change and do it as part of End to end testing for logging, metrics and traces #647)

@dprotaso
Copy link
Member

configure fluentd-ds to capture /var/lib/kubelet/pods//volumes/kubernetes.io~empty-dir/knative-varlog-collect//*

From what I recall the fluentd daemonset needs pod name and namespace in this path regex in order to tag these logs with the appropriate pod metadata (annotations/labels etc).

Is that still the case?

@JRBANCEL
Copy link
Contributor

Great question.
Right now, the k8s fluentd plugin relies on this regex to extract pod_name, namespace, container_name and docker_id. It is configurable.
namespace and container_name could be injected in the volume name and parsed from it, but there is no way to do that with pod_name and docker_id.

/var/lib/kubelet/pods/** only contains pod_id. We could fork the fluentd plugin to query k8s for the remaining metadata using the pod_id instead of namespace and pod_name as it does today. If writing Ruby is not attractive, we could also have a Daemon Set (sidecar of fluentd-ds or not) that watchs /var/lib/kubelet/pods/** and when it finds a pod with knative-varlog-collect, it would query k8s, and create a link from /some/path/fluentd/knows/about/<namespace>/<pod_name>/<container_name> -> /var/lib/kubelet/pods/<pod_id>/volumes/kubernetes.io~empty-dir/knative-varlog-collect/. then fluentd would have all the information it needs to add tags (except docker_id, but is it really required?)

This seems pretty convoluted though.

@yanweiguo
Copy link
Contributor Author

yanweiguo commented May 20, 2019

Great question.
Right now, the k8s fluentd plugin relies on this regex to extract pod_name, namespace, container_name and docker_id. It is configurable.
namespace and container_name could be injected in the volume name and parsed from it, but there is no way to do that with pod_name and docker_id.

/var/lib/kubelet/pods/** only contains pod_id. We could fork the fluentd plugin to query k8s for the remaining metadata using the pod_id instead of namespace and pod_name as it does today. If writing Ruby is not attractive, we could also have a Daemon Set (sidecar of fluentd-ds or not) that watchs /var/lib/kubelet/pods/** and when it finds a pod with knative-varlog-collect, it would query k8s, and create a link from /some/path/fluentd/knows/about/<namespace>/<pod_name>/<container_name> -> /var/lib/kubelet/pods/<pod_id>/volumes/kubernetes.io~empty-dir/knative-varlog-collect/. then fluentd would have all the information it needs to add tags (except docker_id, but is it really required?)

This seems pretty convoluted though.

docker_id is used for caching in kubernetes_metadata plugin. We can use pod id or pod name + namespace as identifier for caching.

The k8s feature we're waiting for is providing the ability to inject pod name to volume path.

@JRBANCEL
Copy link
Contributor

JRBANCEL commented May 20, 2019

Here is a solution to the tagging problem.

We define 3 volumes:

  • knative-var-log: emptyDir (named varlog today)
  • knative-internal: emptyDir
  • pod-info: downwardAPI

An init container (or sidecar job to minimize startup time, ordering doesn't matter here) mounts knative-internal to X and pod-info to Y. All it does is run ln -s ../knative-var-log "X/$(cat Y/namespace)_$(cat Y/pod_name)_user-container" and terminates.

user-container doesn't change. It still mounts knative-var-log to /var/log.

The host now has the following link: /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-internal/<NAMESPACE>_<POD_NAME>_<CONTAINER_NAME> -> /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-var-log.
In other words, there is a path containing container_name, pod_name pointing to the folder mounted as /var/log in user-container. That's all fluentd needs.

We add /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/knative-internal/**/*/**/* as a source in the fluentd config (**/*/** is a trick to have Ruby's Dir.glob follow the symlink during path expansion) and set tag_to_kubernetes_name_regexp to (?<docker_id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\.volumes.kubernetes\.io~empty-dir\.knative-internal\.(?<namespace>[^_]+)_(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<container_name>user-container)\..*?$ so that the k8s filter plugin populates the Kubernetes tags properly.

There is no copy, no sidecar, no fluentd plugin fork.
The only caveat is that container_id is set to pod_id and not the real container ID from Docker. Is this a blocker?

Thoughts?

@yanweiguo
Copy link
Contributor Author

Here is a solution to the tagging problem.

We define 3 volumes:

  • knative-var-log: emptyDir (named varlog today)
  • knative-internal: emptyDir
  • pod-info: downwardAPI

An init container (or sidecar job to minimize startup time, ordering doesn't matter here) mounts knative-internal to X and pod-info to Y. All it does is run ln -s ../knative-var-log "X/$(cat Y/namespace)_$(cat Y/pod_name)_user-container" and terminates.

user-container doesn't change. It still mounts knative-var-log to /var/log.

The host now has the following link: /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-internal/<NAMESPACE>_<POD_NAME>_<CONTAINER_NAME> -> /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-var-log.
In other words, there is a path containing container_name, pod_name pointing to the folder mounted as /var/log in user-container. That's all fluentd needs.

We add /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/knative-internal/**/*/**/* as a source in the fluentd config (/*/ is a trick to have Ruby's Dir.glob follow the symlink during path expansion) and set tag_to_kubernetes_name_regexp to (?<docker_id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\.volumes.kubernetes\.io~empty-dir\.knative-internal\.(?<namespace>[^_]+)_(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<container_name>user-container)\..*?$ so that the k8s filter plugin populates the Kubernetes tags properly.

There is no copy, no sidecar, no fluentd plugin fork.
The only caveat is that container_id is set to pod_id and not the real container ID from Docker. Is this a blocker?

Thoughts?

SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring kind/feature Well-understood/specified features, ready for coding. P0 P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants