Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker service discovery - logging interrupted when a container is restarted #5259

Closed
lucasra1 opened this issue Jan 27, 2022 · 3 comments · Fixed by #5280
Closed

Docker service discovery - logging interrupted when a container is restarted #5259

lucasra1 opened this issue Jan 27, 2022 · 3 comments · Fixed by #5280

Comments

@lucasra1
Copy link

lucasra1 commented Jan 27, 2022

Describe the bug
I was just testing the new docker_sd_config. Everything worked like a charm, until I noticed something weird. When I restart a container, from which I scrape logs with the docker_sd_config, promtail seems to not send the logs to loki after the restart. (All other containers still work fine.) When I then restart promtail, all the missing logs get delivered at once. So nothing is missing, but after the restart it seems to stop sending the logs of this specific container to loki.

When restarting any container (id of the container stays the same) this happens. When instead recreating the container (id changes) this does not seem to happen.

To Reproduce
Steps to reproduce the behavior:

  1. Loki running in docker container (main-87cbe50 (branch: main, revision: 87cbe50)) (see config in attachment)
  2. Promtail running in docker container (main-87cbe50 (branch: main, revision: 87cbe50)) (see config in attachment)
  3. Using Grafana to explore logs from loki
  4. Restart container (e.g. Traefik Proxy container)
  5. Try to look at the logs again
  6. No logs visible after the restart of the container
  7. docker logs traefik show logs normally
  8. Restart promtail container
  9. All logs visible now (none missing)

Expected behavior
After restarting a container (e.g. Traefik) expect logs to be send to loki normally

Environment:

  • Infrastructure: docker in a vm
  • Deployment tool: ansible

Screenshots, Promtail config, or terminal output

Loki config yml
auth_enabled: false


server:
  http_listen_port: 3100
  log_level: "debug"
  grpc_server_max_send_msg_size: 104857600 # 100MB
  grpc_server_max_recv_msg_size: 104857600 # 100MB

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  wal:
    enabled: true
    dir: /loki/wal
schema_config:
  configs:
  - from: 2020-05-15
    store: boltdb
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 168h

storage_config:
  boltdb:
    directory: /loki/index

  filesystem:
    directory: /loki/chunks
promtail config yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: "debug"

positions:
  filename: /promtail/positions.yaml

client:
  url: http://loki:3100/loki/api/v1/push
  batchsize: 104857600 # 10 MB


scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'

target_config:
  sync_period: 15s
Logs of promtail when container gets restarted

5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d being the id of the container restarted

2022-01-27 21:26:28 | level=info ts=2022-01-27T20:26:28.765328356Z caller=target.go:122 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="finished transferring logs" written=326624 container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:28 | level=debug ts=2022-01-27T20:26:28.76540484Z caller=target.go:135 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="done processing Docker logs" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:28 | level=debug ts=2022-01-27T20:26:28.765423466Z caller=target.go:204 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="stopped Docker target" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822335406Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
 2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822338973Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822342109Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822344964Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.82234809Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 21:26:30 | level=debug ts=2022-01-27T20:26:30.822350835Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
Promtail logs after restarting promtail container
2022-01-27 23:02:57 | level=debug ts=2022-01-27T22:02:57.634640568Z caller=target.go:204 target=docker/5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d msg="stopped Docker target" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=error ts=2022-01-27T22:03:03.11376339Z caller=target_group.go:94 msg="added Docker target" containerID=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113766746Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113769762Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113772507Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113775212Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21dShow context
2022-01-27 23:03:03 | level=debug ts=2022-01-27T22:03:03.113777717Z caller=target_group.go:76 msg="ignoring container that is already being scraped" container=5c66fcb97ccd197d9cd2395a15d4016bff76bc86545cd0dbf0bf266c43f6c21d

Let me know if i can help you further.

@jeschkies
Copy link
Contributor

Thanks a ton for the extensive description 👏 The culprit is

_, ok := tg.targets[id]
if ok {
level.Debug(tg.logger).Log("msg", "ignoring container that is already being scraped", "container", id)
return nil
}

We are not restarting a finished container. I wonder whether it's enough to check if the target is stopped and restart it. Did you just restart the container with docker restart? I'd like to know so I can test it.

@lucasra1
Copy link
Author

Yes, I just used docker restart. e.g. docker restart traefik then the logs stopped comming for the traefik container.... After restarting promtail (docker restart promtail) all the logs appear. Nothing missing.

jeschkies added a commit to jeschkies/loki that referenced this issue Jan 31, 2022
Summary:
The Docker target would not reconnect to a container if it restarted and
kept the same ID. This should introduces a `startIfNotRunning` method
that will be called to restart scraping the container's logs.

Fixes grafana#5259
jeschkies added a commit that referenced this issue Feb 1, 2022
Summary:
The Docker target would not reconnect to a container if it restarted and
kept the same ID. This should introduces a `startIfNotRunning` method
that will be called to restart scraping the container's logs.

Fixes #5259
@lucasra1
Copy link
Author

lucasra1 commented Feb 1, 2022

Just tested the newest version with this change. Seems to work fine, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants