-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promtail Journald Log Forwarding Not Functioning with systemd v246 (affects FedoraCoreOS / Fedora 33 systems) #2792
Comments
Non-working and working Note +ZSTD in non-working host view. Non-working Host View
Non-Working Container View
Working Host View:
Working Container View:
|
Are you sure
|
Yes /var/log/journal is most definitely populated, and has a current timestamp on both old working nodes, and new non-working nodes. There is a /var/run/log/journal directory but it is stale according to its timestamp on both working and non-working configurations. |
I think any Fedora33 based system will no longer be able to use promtail for journald until this is addressed. @slim-bean any recommendations on how this issue should be handled, or recommendations for Fedora/promtail users? Since Fedora is upstream to Redhat and Openshift (via OKD & FedoraCoreOS) I am imagining the impact is going to broaden. |
I was hoping that rebuilding promtail from source for buster, backporting libsystemd-dev which then would be on the right systemd version (246) with the correct flags for FedoraCoreOS (including especially +ZSTD) would solve things. Indeed Loki debug trace reveals correct reading of Journald log lines...but it seems changes would also be required to the Go program as it is giving a stack trace when attempting the send to Loki. |
This issue continues to manifest itself on Promtail 2.0.0 with Loki 2.0.0. Original description updated to reflect this. |
Good news. Rebuilding promtail from source for buster, backporting libsystemd-dev does in fact solve things providing one sets the BUILD_IN_CONTAINER switch to force rebuild of the promtail code itself against backported C modules (missing this switch was my earlier 'muppet' mistake). It is not clear to me exactly how this should be fixed in the project itself. It would seem that the right end goal should be an additional tag available in dockerhub like promtail:2.0.0-systemd-246? For the record the diff of the changes made on the v2.0.0 tag to the promtail/dockerfile are: -FROM golang:1.14.2 as build
+FROM golang:1.14.2-buster as build
# TOUCH_PROTOS signifies if we should touch the compiled proto files and thus not regenerate them.
# This is helpful when file system timestamps can't be trusted with make
ARG TOUCH_PROTOS
COPY . /src/loki
WORKDIR /src/loki
-RUN apt-get update && apt-get install -qy libsystemd-dev
-RUN make clean && (if [ "${TOUCH_PROTOS}" ]; then make touch-protos; fi) && make BUILD_IN_CONTAINER=false promtail
+RUN echo "deb http://deb.debian.org/debian buster-backports main" >> /etc/apt/sources.list
+RUN apt-get update && apt-get install -t buster-backports -qy libsystemd-dev
+RUN make clean && (if [ "${TOUCH_PROTOS}" ]; then make touch-protos; fi) && make BUILD_IN_CONTAINER=true promtail
# Promtail requires debian as the base image to support systemd journal reading
-FROM debian:stretch-slim
+FROM debian:buster-slim
# tzdata required for the timestamp stage to work
+RUN echo "deb http://deb.debian.org/debian buster-backports main" >> /etc/apt/sources.list
RUN apt-get update && \
apt-get install -qy \
- tzdata ca-certificates libsystemd-dev && \
+ tzdata ca-certificates
+RUN apt-get install -t buster-backports -qy libsystemd-dev && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
COPY --from=build /src/loki/cmd/promtail/promtail /usr/bin/promtail
COPY cmd/promtail/promtail-docker-config.yaml /etc/promtail/config.yml |
I'm a little confused following this, why is the backport necessary? Is it enough to just update the build image and container image to buster? |
My understanding is that buster ships with systemd version v241 which shows the feature flags listed below. These do not include a v246 feature flag +ZSTD which is present in Fedora33 and apparently references a compression algorithm (https://facebook.github.io/zstd/). Backporting the newer v246 to buster therefore seems necessary to ensure that the loki go project, which uses the systemd go project, which uses native systemd C apis to access journald, does so correctly. I did experiment with switching journald compression off altogether but this did not fix things, and in any case is non-ideal. Disclaimer however that I have never played with journald configuration before so it is possible I made a mistake with this. % systemctl --version
systemd 241 (241)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid Fedora of course is upstream to Redhat but also FedoraCoreOS which is upstream to the Open Kubernetes Distribution (OKD) which is upstream to OpenShift. So, I guess this is of significance to a fairly wide base. FedoraCoreOS are running their 'next' stream with Fedora33 already to shake out issues like these. |
Same problem for ArchLinux:
|
Systemd journals created with 246+ are deliberately incompatible with earlier versions. The backport is required to ship those logs as the C linkage needs to be up to date here. |
@cyriltovena do you mind looking at this, please? |
Will take a look at this tomorrow and get this merged, thanks everyone, apologies for delay |
This issue now manifesting itself on the 'testing' FedoraCoreOS stream with an 11/30 timetable to go to the 'stable' stream at which point the impact of this issue will become more widespread. |
working on this now, should be in a release early next week |
@fifofonix can you (or anyone else) test out this image to see if it works:
I verified it worked on our existing systems but I don't have a newer system handy to make sure if fixes the original issue. |
oh hrm where you testing with the docker image or a binary? |
Looks good right now. I am running the container you built on both old (FCOS Fedora 32 w/systemd 245) and new (FCOS Fedora 33 w/systemd 246 with +ZSTD compression) servers and journald logs are aggregating from both. I'm going to leave those running and continue to monitor. |
I'm testing image |
|
|
Thanks everyone for testing this! |
Hey @slim-bean I'm seeing what may be a similar issue just running on docker--if the file is created after promtail starts, it says it can't be found. If the file is created before, it ingests the existing logs but doesn't tail. |
I appear to have run into this issue using the Balena OS recently updated their systemd
Is there an official ARM docker image that addresses this issue? |
Upon further inspection, it looks like the related pull request #2957 only touched |
the |
Promtail 1.6.1 and 2.0.0 docker images have ceased forwarding Journald when running on 'next' version of FedoraCoreOS which is based on upstream Fedora 33. Issue reported via FCOS issues and suggestion is that there may now be an incompatibility in journald compression level (FedoraCoreOS systemd now with +ZSTD compression).
Steps to reproduce the behavior:
Expect to see verbose logging in the promtail logs associated with relay of journald (like we see on pre Fedora33 FedoraCoreOS servers). Instead we only see occasional debug messages regarding the local log file config (expected).
Environment:
Promtail Config:
Sample Promtail Log
The text was updated successfully, but these errors were encountered: