Docker service discovery - logging interrupted when a container is restarted #5259

lucasra1 · 2022-01-27T22:26:28Z

Describe the bug
I was just testing the new docker_sd_config. Everything worked like a charm, until I noticed something weird. When I restart a container, from which I scrape logs with the docker_sd_config, promtail seems to not send the logs to loki after the restart. (All other containers still work fine.) When I then restart promtail, all the missing logs get delivered at once. So nothing is missing, but after the restart it seems to stop sending the logs of this specific container to loki.

When restarting any container (id of the container stays the same) this happens. When instead recreating the container (id changes) this does not seem to happen.

To Reproduce
Steps to reproduce the behavior:

Loki running in docker container (main-87cbe50 (branch: main, revision: 87cbe50)) (see config in attachment)
Promtail running in docker container (main-87cbe50 (branch: main, revision: 87cbe50)) (see config in attachment)
Using Grafana to explore logs from loki
Restart container (e.g. Traefik Proxy container)
Try to look at the logs again
No logs visible after the restart of the container
docker logs traefik show logs normally
Restart promtail container
All logs visible now (none missing)

Expected behavior
After restarting a container (e.g. Traefik) expect logs to be send to loki normally

Environment:

Infrastructure: docker in a vm
Deployment tool: ansible

Screenshots, Promtail config, or terminal output

Loki config yml

auth_enabled: false


server:
  http_listen_port: 3100
  log_level: "debug"
  grpc_server_max_send_msg_size: 104857600 # 100MB
  grpc_server_max_recv_msg_size: 104857600 # 100MB

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  wal:
    enabled: true
    dir: /loki/wal
schema_config:
  configs:
  - from: 2020-05-15
    store: boltdb
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 168h

storage_config:
  boltdb:
    directory: /loki/index

  filesystem:
    directory: /loki/chunks

promtail config yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: "debug"

positions:
  filename: /promtail/positions.yaml

client:
  url: http://loki:3100/loki/api/v1/push
  batchsize: 104857600 # 10 MB


scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'

target_config:
  sync_period: 15s

Logs of promtail when container gets restarted

5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d being the id of the container restarted

2022-01-27 21:26:28 | level=info ts=2022-01-27T20:26:28.765328356Z caller=target.go:122 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="finished transferring logs" written=326624 container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:28 | level=debug ts=2022-01-27T20:26:28.76540484Z caller=target.go:135 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="done processing Docker logs" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:28 | level=debug ts=2022-01-27T20:26:28.765423466Z caller=target.go:204 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="stopped Docker target" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822335406Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
 2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822338973Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822342109Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822344964Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.82234809Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822350835Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d

Promtail logs after restarting promtail container

2022-01-27 23:02:57 | level=debug ts=2022-01-27T22:02:57.634640568Z caller=target.go:204 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="stopped Docker target" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=error ts=2022-01-27T22:03:03.11376339Z caller=target_group.go:94 msg="added Docker target" containerID=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113766746Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113769762Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113772507Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113775212Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21dShow context
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113777717Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d

Let me know if i can help you further.

The text was updated successfully, but these errors were encountered:

jeschkies · 2022-01-28T13:08:08Z

Thanks a ton for the extensive description 👏 The culprit is

loki/clients/pkg/promtail/targets/docker/target_group.go

Lines 75 to 79 in f5d5701

    
           _, ok := tg.targets[id] 
        
           if ok { 
        
           	level.Debug(tg.logger).Log("msg", "ignoring container that is already being scraped", "container", id) 
        
           	return nil 
        
           }

We are not restarting a finished container. I wonder whether it's enough to check if the target is stopped and restart it. Did you just restart the container with docker restart? I'd like to know so I can test it.

lucasra1 · 2022-01-28T21:07:38Z

Yes, I just used docker restart. e.g. docker restart traefik then the logs stopped comming for the traefik container.... After restarting promtail (docker restart promtail) all the logs appear. Nothing missing.

Summary: The Docker target would not reconnect to a container if it restarted and kept the same ID. This should introduces a `startIfNotRunning` method that will be called to restart scraping the container's logs. Fixes grafana#5259

Summary: The Docker target would not reconnect to a container if it restarted and kept the same ID. This should introduces a `startIfNotRunning` method that will be called to restart scraping the container's logs. Fixes #5259

lucasra1 · 2022-02-01T12:51:07Z

Just tested the newest version with this change. Seems to work fine, thanks a lot!

lucasra1 mentioned this issue Jan 27, 2022

A Docker service discovery to Promtail #4703

Closed

jeschkies mentioned this issue Jan 31, 2022

Restart Docker target if stopped. #5280

Merged

3 tasks

jeschkies closed this as completed in #5280 Feb 1, 2022

dhorbach mentioned this issue Apr 11, 2024

Logs are not forwarded to Loki after target docker container is restarted grafana/alloy#278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker service discovery - logging interrupted when a container is restarted #5259

Docker service discovery - logging interrupted when a container is restarted #5259

lucasra1 commented Jan 27, 2022 •

edited

Loading

jeschkies commented Jan 28, 2022

lucasra1 commented Jan 28, 2022

lucasra1 commented Feb 1, 2022

Docker service discovery - logging interrupted when a container is restarted #5259

Docker service discovery - logging interrupted when a container is restarted #5259

Comments

lucasra1 commented Jan 27, 2022 • edited Loading

jeschkies commented Jan 28, 2022

lucasra1 commented Jan 28, 2022

lucasra1 commented Feb 1, 2022

lucasra1 commented Jan 27, 2022 •

edited

Loading