Long-term story for /var/log support #818

yanweiguo · 2018-05-07T20:39:24Z

Expected Behavior

Collect logs from /var/log using the same DaemonSet which collects stderr/stdout logs.

Actual Behavior

/var/log is collect by a sidecar container per pod, which is resource heavy.

Additional Info

From @dprotaso: given that container volumes have a deterministic layout on the host
/var/lib/kubelet/pods/<pod uid>/volumes/<volume type>/<volume name>

We could get rid of the fluentd sidecar by modifying/forking the fluentd k8s metadata filter plugin. Then we could have fluentd source logs from /var/lib/kubelet/pods/*/volumes/emptyDir/ela-varlog-volume/**/*.* (volume name being created/controlled by the revision controller).

The text was updated successfully, but these errors were encountered:

yanweiguo · 2018-05-09T01:26:56Z

The host path based on current configuration is: /var/lib/kubelet/pods/<pod-id>/volumes/kubernetes.io~empty-dir/varlog. Verified that existing fluentd daemonset has permission to read.

dprotaso · 2018-05-10T15:46:01Z

re-posting from slack for context

Right now pod information cannot be obtained using the pod uid (see kubernetes/kubernetes#20572)

The kubernetes API requires the namespace and pod name.

One workaround is to have the namespace name included in the volume. You still need the pod name if you don't want to fetch all the pods. We can get the container name by explicitly putting it in the volume path (not the subpath).

Two approaches:

Add another downward api volume to the pod that will contain the namespace & pod name in addition to other labels/annotations. Then modify the plugin to read this specific volume when there's a cache miss on the pod id.
Wait for this PR which adds downward api support to subpaths. This should land as alpha 1.11 but will still require us to put the namespace in the volume name.

The first solution would have the side effect of not needing to hit the k8s api to get the pod metadata (need to verify). Both these solutions don't have the container's id. As it's not supported in the downward API. @mdemirhan @yanweiguo @bsnchan I'm curious if the container id is used for any dashboards at the moment.

We could open a k8s issue to get the container id as an exposed attribute that's available via the downward API.

yanweiguo · 2018-05-10T18:47:11Z

We don't use the container id in any dashboards currently. The container id in the plugin is used for cache.

In second solution, I think we can generate container id with the value of pod id or pod name + namespace in fluent tag to make the cache work.

IMO, the second solution would be less complicated.

yanweiguo · 2018-10-24T17:45:36Z

An alternative method is to use a sidecar to tail files under /var/log/ to stdout. This is much cheaper than fluentd sidecar in terms of resource.

The difficultly is that the file paths are arbitrary. We need a way to detect that a new file is created then start tailing it and stop tailing when it is deleted. There may be some lib supporting this.

liu-cong · 2019-02-14T04:23:16Z

Now that the VolumeSubpathEnvExpansion feature is available as an Alpha feature, I think adding namespace and pod name in the subpath is the preferred option.

Please comment on this and I can create a PR.

dprotaso · 2019-02-14T05:02:45Z

Knative is a bit conservative and we don't adopt K8s alpha features. The main reason is that providers of a hosted K8s don't typically turn on alpha features for production clusters. Hence why I'm waiting for VolumeSubpathEnvExpansion to become a beta feature

…

On Wed, 13 Feb 2019 at 23:23 Cong Liu ***@***.***> wrote: Now that the VolumeSubpathEnvExpansion <https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath-with-expanded-environment-variables> feature is available as an Alpha feature, I think adding namespace and pod name in the subpath is the preferred option. Please comment on this and I can create a PR. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#818 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABIgrEdwokxlj2QBe2lCaUCWn0BDlHFks5vNOS1gaJpZM4T1mwb> .

sebgoa · 2019-04-24T07:25:49Z

fwiw, I do agree with the original comment that a fluentd sidecar is resource heavy. Sidecars are great, but I am really concerned they will affect the scalability and performance.

So +1 on using a daemonset.

mdemirhan · 2019-04-25T17:06:45Z

@dprotaso @yanweiguo @bsnchan Should I remove your names from the assignees field and let someone else pick this work up?

mattmoor · 2019-04-26T20:43:48Z

What version of Kubernetes introduces the feature we need as Beta?

yanweiguo · 2019-04-26T21:06:17Z

What version of Kubernetes introduces the feature we need as Beta?

We need VolumeSubpathEnvExpansion feature. Still alpha in 1.14.

dprotaso · 2019-04-26T22:35:00Z

Yeah I’ve been waiting for it to be beta

…

On Fri, Apr 26, 2019 at 17:06 Yanwei Guo ***@***.***> wrote: What version of Kubernetes introduces the feature we need as Beta? We need VolumeSubpathEnvExpansion <https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath-with-expanded-environment-variables> feature. Still alpha in 1.14. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#818 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAERAQRTZZ7Y3RNPCA4X73PSNVETANCNFSM4E6WNQNQ> .

mattmoor · 2019-05-02T17:47:31Z

1.14 alpha is way too far out.

@mdemirhan suggested that we just volume mount /var/log into a sidecar that we ship (lightweight) and stream the logs to the sidecar's STDOUT (replacing the huge sidecar we have today).

We could also potentially mount /dev/stdout or /dev/stderr into the sidecar (from the user-container) and write the stuff we see hit /var/log to user-container's STDOUT.

mattmoor · 2019-05-04T14:26:31Z

/assign @JRBANCEL

Talked to @mdemirhan about JR picking this up in 0.7.

mdemirhan · 2019-05-07T20:05:53Z

An advantage of the lightweight sidecar doing tail and writing to stdout is that these logs will also be available in kubectl logs.

JRBANCEL · 2019-05-17T03:24:27Z

We could also potentially mount /dev/stdout or /dev/stderr into the sidecar (from the user-container) and write the stuff we see hit /var/log to user-container's STDOUT.

As discussed offline, this is not possible.
/dev/stdout is a symlink to /proc/self/fd/1. /proc/self is virtual and is different per process.

The same goal can be achieved by mounting the host /proc (to e.g. /hostproc) and the user-container /var/log into the sidecar.
The sidecar can figure out the PID of the user-container through /hostproc (not trivial though), and the sidecar can write to /hostproc/<PID_USER_CONTAINER_PROCESS>/fd/1 which is effectively user-container's stdout.

I don't think it is a great approach anyway:

it relies on implementation details (Linux, procfs), not on k8s concepts
concurrent appends to a file work but they can be fragmented -> corrupted logs
merging stdout and /var/log logs can be confusing

To recap:

Solution 1 - Tailing Sidecar

Mounting user-container's /var/log into the sidecar and have it tail and watch new files seems cleaner.
This can (should?) be done outside of Knative, it is a pretty generic feature. It might even already exist.

(+) no work on fluentd side & independent of log collection mechanism
(+) kubectl logs works
(-) need to implement watching/tailing logic if it doesn't already exist
(-) two copies (/var/log -> tail -> stdout -> docker -> /var/log/containers)

Solution 2 - Fluentd reads from volume directly

(+) no sidecar
(+) no copy
(-) doesn't work with logs being rotated (see https://docs.fluentd.org/v1.0/articles/in_tail#path)
(-) the format of the logs in /var/log must be the same for ALL the Knative services

subPathExpr support is not required, fluentd relies on Ruby's Dir.glob for path expansion so/var/lib/kubelet/pods/**/volumes/kubernetes.io~empty-dir/varlog/**/* is enough, I verified it.

To avoid collisions (i.e. non-Knative pods with a volume named varlog) and differentiate between Knative services that have log collection enabled or not, we could name the volume knative-varlog-ignore and knative-varlog-collect.

JRBANCEL · 2019-05-17T21:00:03Z

@mattmoor, @yanweiguo, @mdemirhan please provide feedback.

Going with solution 2, here is the plan:

mount host /var/lib/kubelet/pods to fluentd-ds daemon set
remove fluentd sidecar injection logic from reconciler
add logic to dynamically name the emptyDir volume mounted to /var/log in user-container knative-varlog-ignore if EnableVarLogCollection is false and knative-varlog-collect if EnableVarLogCollection is true
configure fluentd-ds to capture /var/lib/kubelet/pods/**/volumes/kubernetes.io~empty-dir/knative-varlog-collect/**/*
add an e2e test that deploys a Knative service with collection enabled/disabled and validate that the logs are/aren't being collected (might be out of scope for this change and do it as part of End to end testing for logging, metrics and traces #647)

dprotaso · 2019-05-17T23:31:08Z

configure fluentd-ds to capture /var/lib/kubelet/pods//volumes/kubernetes.io~empty-dir/knative-varlog-collect//*

From what I recall the fluentd daemonset needs pod name and namespace in this path regex in order to tag these logs with the appropriate pod metadata (annotations/labels etc).

Is that still the case?

JRBANCEL · 2019-05-18T01:18:25Z

Great question.
Right now, the k8s fluentd plugin relies on this regex to extract pod_name, namespace, container_name and docker_id. It is configurable.
namespace and container_name could be injected in the volume name and parsed from it, but there is no way to do that with pod_name and docker_id.

/var/lib/kubelet/pods/** only contains pod_id. We could fork the fluentd plugin to query k8s for the remaining metadata using the pod_id instead of namespace and pod_name as it does today. If writing Ruby is not attractive, we could also have a Daemon Set (sidecar of fluentd-ds or not) that watchs /var/lib/kubelet/pods/** and when it finds a pod with knative-varlog-collect, it would query k8s, and create a link from /some/path/fluentd/knows/about/<namespace>/<pod_name>/<container_name> -> /var/lib/kubelet/pods/<pod_id>/volumes/kubernetes.io~empty-dir/knative-varlog-collect/. then fluentd would have all the information it needs to add tags (except docker_id, but is it really required?)

This seems pretty convoluted though.

yanweiguo · 2019-05-20T06:59:34Z

Great question.
Right now, the k8s fluentd plugin relies on this regex to extract pod_name, namespace, container_name and docker_id. It is configurable.
namespace and container_name could be injected in the volume name and parsed from it, but there is no way to do that with pod_name and docker_id.

/var/lib/kubelet/pods/** only contains pod_id. We could fork the fluentd plugin to query k8s for the remaining metadata using the pod_id instead of namespace and pod_name as it does today. If writing Ruby is not attractive, we could also have a Daemon Set (sidecar of fluentd-ds or not) that watchs /var/lib/kubelet/pods/** and when it finds a pod with knative-varlog-collect, it would query k8s, and create a link from /some/path/fluentd/knows/about/<namespace>/<pod_name>/<container_name> -> /var/lib/kubelet/pods/<pod_id>/volumes/kubernetes.io~empty-dir/knative-varlog-collect/. then fluentd would have all the information it needs to add tags (except docker_id, but is it really required?)

This seems pretty convoluted though.

docker_id is used for caching in kubernetes_metadata plugin. We can use pod id or pod name + namespace as identifier for caching.

The k8s feature we're waiting for is providing the ability to inject pod name to volume path.

JRBANCEL · 2019-05-20T23:30:15Z

Here is a solution to the tagging problem.

We define 3 volumes:

knative-var-log: emptyDir (named varlog today)
knative-internal: emptyDir
pod-info: downwardAPI

An init container (or sidecar job to minimize startup time, ordering doesn't matter here) mounts knative-internal to X and pod-info to Y. All it does is run ln -s ../knative-var-log "X/$(cat Y/namespace)_$(cat Y/pod_name)_user-container" and terminates.

user-container doesn't change. It still mounts knative-var-log to /var/log.

The host now has the following link: /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-internal/<NAMESPACE>_<POD_NAME>_<CONTAINER_NAME> -> /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-var-log.
In other words, there is a path containing container_name, pod_name pointing to the folder mounted as /var/log in user-container. That's all fluentd needs.

We add /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/knative-internal/**/*/**/* as a source in the fluentd config (**/*/** is a trick to have Ruby's Dir.glob follow the symlink during path expansion) and set tag_to_kubernetes_name_regexp to (?<docker_id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\.volumes.kubernetes\.io~empty-dir\.knative-internal\.(?<namespace>[^_]+)_(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<container_name>user-container)\..*?$ so that the k8s filter plugin populates the Kubernetes tags properly.

There is no copy, no sidecar, no fluentd plugin fork.
The only caveat is that container_id is set to pod_id and not the real container ID from Docker. Is this a blocker?

Thoughts?

yanweiguo · 2019-05-21T07:14:17Z

Here is a solution to the tagging problem.

We define 3 volumes:

knative-var-log: emptyDir (named varlog today)

knative-internal: emptyDir

pod-info: downwardAPI

An init container (or sidecar job to minimize startup time, ordering doesn't matter here) mounts knative-internal to X and pod-info to Y. All it does is run ln -s ../knative-var-log "X/$(cat Y/namespace)_$(cat Y/pod_name)_user-container" and terminates.

user-container doesn't change. It still mounts knative-var-log to /var/log.

The host now has the following link: /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-internal/<NAMESPACE>_<POD_NAME>_<CONTAINER_NAME> -> /var/lib/kubelet/pods/<POD_ID>/volumes/kubernetes.io~empty-dir/knative-var-log.
In other words, there is a path containing container_name, pod_name pointing to the folder mounted as /var/log in user-container. That's all fluentd needs.

We add /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/knative-internal/**/*/**/* as a source in the fluentd config (/*/ is a trick to have Ruby's Dir.glob follow the symlink during path expansion) and set tag_to_kubernetes_name_regexp to (?<docker_id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\.volumes.kubernetes\.io~empty-dir\.knative-internal\.(?<namespace>[^_]+)_(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<container_name>user-container)\..*?$ so that the k8s filter plugin populates the Kubernetes tags properly.

There is no copy, no sidecar, no fluentd plugin fork.
The only caveat is that container_id is set to pod_id and not the real container ID from Docker. Is this a blocker?

Thoughts?

SGTM

Related to issue knative/serving#818 and change knative/serving#4156.

google-prow-robot assigned yanweiguo May 7, 2018

google-prow-robot added area/monitoring kind/feature Well-understood/specified features, ready for coding. labels May 7, 2018

yanweiguo assigned dprotaso and bsnchan May 7, 2018

mdemirhan added this to the M5 milestone May 8, 2018

yanweiguo mentioned this issue May 17, 2018

Move fluentd sidecar output config to ela config map #904

Merged

yanweiguo mentioned this issue May 25, 2018

Revision pods before and after Knative controller redeployment have inconsistent fluentd configmap #976

Closed

mdemirhan removed this from the M5 milestone May 31, 2018

dprotaso mentioned this issue Oct 24, 2018

serving.yaml should not include elastic search image #2300

Closed

mdemirhan added this to the Serving "v1" (ready for production) milestone Nov 27, 2018

dgerd mentioned this issue Apr 23, 2019

volume mount break Pod startup #3809

Closed

mattmoor mentioned this issue Apr 26, 2019

Long-term story for /var/log support #1244

Closed

mattmoor unassigned dprotaso, bsnchan and yanweiguo Apr 26, 2019

mattmoor mentioned this issue Apr 26, 2019

Add log rotate for log files under /var/log #679

Closed

mattmoor changed the title ~~Use fluentd DaemonSet to collect /var/log instead of fluentd sidecar~~ Long-term story for /var/log support May 4, 2019

mattmoor modified the milestones: Serving "v1" (ready for production), Serving 0.7 May 4, 2019

knative-prow-robot assigned JRBANCEL May 4, 2019

JRBANCEL mentioned this issue May 23, 2019

Collect /var/log without fluentd sidecar #4156

Merged

JRBANCEL added a commit to JRBANCEL/docs that referenced this issue Jun 5, 2019

Update logging plugin documentation

c858aa8

Related to issue knative/serving#818 and change knative/serving#4156.

eallred-google added the P0 P0 label Jun 5, 2019

knative-prow-robot closed this as completed in #4156 Jun 6, 2019

JRBANCEL mentioned this issue Dec 4, 2019

Stop mounting an emptyDir to /var/log #6147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-term story for /var/log support #818

Long-term story for /var/log support #818

yanweiguo commented May 7, 2018

yanweiguo commented May 9, 2018

dprotaso commented May 10, 2018

yanweiguo commented May 10, 2018

yanweiguo commented Oct 24, 2018 •

edited

Loading

liu-cong commented Feb 14, 2019

dprotaso commented Feb 14, 2019 via email

sebgoa commented Apr 24, 2019

mdemirhan commented Apr 25, 2019

mattmoor commented Apr 26, 2019

yanweiguo commented Apr 26, 2019

dprotaso commented Apr 26, 2019 via email

mattmoor commented May 2, 2019

mattmoor commented May 4, 2019

mdemirhan commented May 7, 2019

JRBANCEL commented May 17, 2019 •

edited

Loading

JRBANCEL commented May 17, 2019 •

edited

Loading

dprotaso commented May 17, 2019

JRBANCEL commented May 18, 2019

yanweiguo commented May 20, 2019 •

edited

Loading

JRBANCEL commented May 20, 2019 •

edited

Loading

yanweiguo commented May 21, 2019

Long-term story for /var/log support #818

Long-term story for /var/log support #818

Comments

yanweiguo commented May 7, 2018

Expected Behavior

Actual Behavior

Additional Info

yanweiguo commented May 9, 2018

dprotaso commented May 10, 2018

yanweiguo commented May 10, 2018

yanweiguo commented Oct 24, 2018 • edited Loading

liu-cong commented Feb 14, 2019

dprotaso commented Feb 14, 2019 via email

sebgoa commented Apr 24, 2019

mdemirhan commented Apr 25, 2019

mattmoor commented Apr 26, 2019

yanweiguo commented Apr 26, 2019

dprotaso commented Apr 26, 2019 via email

mattmoor commented May 2, 2019

mattmoor commented May 4, 2019

mdemirhan commented May 7, 2019

JRBANCEL commented May 17, 2019 • edited Loading

Solution 1 - Tailing Sidecar

Solution 2 - Fluentd reads from volume directly

JRBANCEL commented May 17, 2019 • edited Loading

dprotaso commented May 17, 2019

JRBANCEL commented May 18, 2019

yanweiguo commented May 20, 2019 • edited Loading

JRBANCEL commented May 20, 2019 • edited Loading

yanweiguo commented May 21, 2019

yanweiguo commented Oct 24, 2018 •

edited

Loading

JRBANCEL commented May 17, 2019 •

edited

Loading

JRBANCEL commented May 17, 2019 •

edited

Loading

yanweiguo commented May 20, 2019 •

edited

Loading

JRBANCEL commented May 20, 2019 •

edited

Loading