Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker is practically unusable when the loki endpoint is not accessible #2017

Closed
rafipiccolo opened this issue Apr 30, 2020 · 5 comments · Fixed by #2116
Closed

docker is practically unusable when the loki endpoint is not accessible #2017

rafipiccolo opened this issue Apr 30, 2020 · 5 comments · Fixed by #2116
Assignees

Comments

@rafipiccolo
Copy link

Describe the bug
docker is practically unusable when the loki endpoint is not accessible.

To Reproduce

  1. create this docker-compose
version: "3.3"
services:
  loki:
    image: grafana/loki:latest
    restart: always
    container_name: loki
    ports:
      - "127.0.0.1:3100:3100"
    volumes:
      - ./loki:/etc/loki
    command: -config.file=/etc/loki/local-config.yaml

  datelogger:
    image: busybox
    container_name: datelogger
    command: sh -c "while true; do $$(echo date); sleep 1; done"
    restart: always
  1. create this file /etc/docker/daemon.json
{
    "debug" : true,
    "log-driver": "loki",
    "log-opts": {
        "loki-url": "http://localhost:3100/loki/api/v1/push",
        "loki-batch-size": "400"
    }
}
  1. restart docker
  2. start
    docker-compose up -d datelogger
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

and the service is not up.

if on the contrary loki and datelogger were both up, and then loki dies. what happens to datelogger. its not even possible to kill it with docker-compose down.

i though about adding "mode": "non-blocking" into "log-opts".
but the loki's driver says it doesnt recognize it.

Do you have a solution ?

Expected behavior
the service should be up. with logs discarded until the connection is available.
if the loki endpoint becomes slow, it shouldn't slow the main server also.

Environment:
ubuntu 18.04 + docker + docker compose

Screenshots, Promtail config, or terminal output

@cyriltovena
Copy link
Contributor

Have you tried to tweak retries and back off ? I’m not sure this error comes from Loki ?

Try with 0 retries via the documented log option.

@nawabb
Copy link

nawabb commented May 17, 2020

Is there a roadmap to support "non-blocking" mode as outlined in Docker documentation for other logging drivers (https://docs.docker.com/config/containers/logging/configure/)

--log-opt mode=non-blocking --log-opt max-buffer-size=4m 

Our application at peak can write up-to 1 million lines of logs per minute. When the application is run without Loki logging driver its throughput is 2.5x compared to when Loki logging driver is used.

@cyriltovena
Copy link
Contributor

I will take a look do you more details? What throughput? Logs? How do you measure it?

@cyriltovena cyriltovena self-assigned this May 21, 2020
@nawabb
Copy link

nawabb commented May 21, 2020

We are using loki 1.4.2 using Docker image and using single-node with local filesystem for both index and chunks. The server running Loki has SSD-based storage with 16TB storage, 40 CPU cores, and 384GB memory. This is our test instance and in future we will likely migrate to multi-node Loki cluster. All Loki server config is set to default.

Our application is Java-based code, running inside Docker container, and each application instance can produce between 100K-2.0M lines of logs per minute. I used Docker Loki logging driver plugin.

Our single node Loki is able to ingest only 1.5 million lines of logs per minute at max, while using only 20% of all CPU resources. If I run one application process, it saturates Loki and produces 1.5M lines / min; If I run 3 application processes, each on different servers each process throttles and only produces 0.5M lines / minute of logs and does less work. Loki driver is using default config values with max-retires=0. This means, the application processes are slowing down (and doing less work) when Loki can not keep up with ingesting logs.

If the Loki Docker driver provided a non-blocking mode then the application process can continue independently of Loki. If Loki was under load, the application process can send logs to buffer; and the buffer can be discarded if Loki is busy without affecting performance of the application.

@cyriltovena
Copy link
Contributor

Yes I can allow this quickly will do it. Ping me if you don't hear me back.

cyriltovena added a commit to cyriltovena/loki that referenced this issue May 22, 2020
Before those configs where not allowed, but docker can use them to have a different log delivery behaviour.

see https://docs.docker.com/config/containers/logging/configure/#configure-the-delivery-mode-of-log-messages-from-container-to-log-driver

Potentially fixes grafana#2017

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
cyriltovena added a commit that referenced this issue Jun 4, 2020
Before those configs where not allowed, but docker can use them to have a different log delivery behaviour.

see https://docs.docker.com/config/containers/logging/configure/#configure-the-delivery-mode-of-log-messages-from-container-to-log-driver

Potentially fixes #2017

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants