Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memberlist advertise_addr not used when advertising node address #9887

Closed
fredrikcarlbom opened this issue Jul 7, 2023 · 3 comments
Closed

Comments

@fredrikcarlbom
Copy link

fredrikcarlbom commented Jul 7, 2023

Describe the bug
Trying to connect two loki nodes, each running on separate docker networks, to a memberlist fails. This seems to be as they do not use the advertise_addr when advertising their address.

To Reproduce
Steps to reproduce the behavior:

  1. Start loki (2.8.2) on two nodes using the files below
  2. View the logs

Expected behavior
The memberlist should be working

Environment:

  • Infrastructure: Ubuntu 22.02 running in OpenStack
  • Deployment tool: docker-compose

Screenshots, Promtail config, or terminal output
loki.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  wal:
    enabled: true
    dir: /tmp/wal
  lifecycler:
    address: ${ADVERTISE_ADDR}
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    final_sleep: 0s

memberlist:
  abort_if_cluster_join_fails: false
  join_members:
    - ${MEMBERLIST_MEMBER}
  advertise_addr: ${ADVERTISE_ADDR}

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    shared_store: s3
  aws:
    s3: http://${LOKI_USER}:${LOKI_PASSWORD}@${S3_URL}/${LOKI_BUCKET}
    s3forcepathstyle: true

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: aws

docker-compose.yml - MEMBERLIST_MEMBER and ADVERTISE_ADDR are different on the two nodes

version: '3.7'

services:
  loki:
    image: grafana/loki:2.8.2
    logging:
      driver: json-file
      options:
        max-file: '3'
        max-size: 10m
    restart: unless-stopped
    command: -config.file=/etc/loki/loki.yml -config.expand-env=true
    ports:
      - "3100:3100"
      - "7946:7946"
      - "9096:9096"
    volumes:
      - ./loki.yml:/etc/loki/loki.yml:rw
    env_file: .docker-loki-env
    environment:
      - S3_URL=minio-a.:9000
      - MEMBERLIST_MEMBER=192.168.131.5
      - ADVERTISE_ADDR=192.168.145.5

Relevant logs from one node

...
loki_1  | level=info ts=2023-07-07T13:16:38.152572235Z caller=memberlist_client.go:576 msg="joining memberlist cluster" join_members=192.168.131.5
...
loki_1  | level=info ts=2023-07-07T13:16:42.304726756Z caller=worker.go:209 msg="adding connection" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:42.304978899Z caller=worker.go:209 msg="adding connection" addr=172.24.0.2:9096
loki_1  | level=info ts=2023-07-07T13:16:42.305154213Z caller=scheduler.go:681 msg="this scheduler is in the ReplicationSet, will now accept requests."
loki_1  | level=warn ts=2023-07-07T13:16:42.30537579Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
...
loki_1  | level=warn ts=2023-07-07T13:16:42.309931757Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
Full log from same node
loki_1  | level=warn ts=2023-07-07T13:16:38.138874052Z caller=loki.go:286 msg="per-tenant timeout not configured, using default engine timeout (\"5m0s\"). This behavior will change in the next major to always use the default per-tenant timeout (\"5m\")."
loki_1  | level=info ts=2023-07-07T13:16:38.141759219Z caller=main.go:108 msg="Starting Loki" version="(version=2.8.2, branch=HEAD, revision=9f809eda7)"
loki_1  | level=info ts=2023-07-07T13:16:38.14250551Z caller=server.go:323 http=[::]:3100 grpc=[::]:9096 msg="server listening on addresses"
loki_1  | level=info ts=2023-07-07T13:16:38.142850782Z caller=modules.go:894 msg="Ruler storage is not configured; ruler will not be started."
loki_1  | level=warn ts=2023-07-07T13:16:38.143943527Z caller=cache.go:114 msg="fifocache config is deprecated. use embedded-cache instead"
loki_1  | level=warn ts=2023-07-07T13:16:38.143965149Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksembedded-cache"
loki_1  | level=info ts=2023-07-07T13:16:38.144579944Z caller=table_manager.go:262 msg="query readiness setup completed" duration=1.8µs distinct_users_len=0
loki_1  | level=info ts=2023-07-07T13:16:38.14461846Z caller=shipper.go:131 msg="starting index shipper in RW mode"
loki_1  | level=info ts=2023-07-07T13:16:38.144805415Z caller=shipper_index_client.go:78 msg="starting boltdb shipper in RW mode"
loki_1  | level=info ts=2023-07-07T13:16:38.145439223Z caller=memberlist_client.go:437 msg="Using memberlist cluster label and node name" cluster_label= node=f070fca0d234-6435696d
loki_1  | level=info ts=2023-07-07T13:16:38.145741434Z caller=memberlist_client.go:543 msg="memberlist fast-join starting" nodes_found=1 to_join=4
loki_1  | level=info ts=2023-07-07T13:16:38.148482257Z caller=worker.go:112 msg="Starting querier worker using query-scheduler and scheduler ring for addresses"
loki_1  | level=info ts=2023-07-07T13:16:38.150352522Z caller=table_manager.go:134 msg="uploading tables"
loki_1  | level=info ts=2023-07-07T13:16:38.150379946Z caller=table_manager.go:166 msg="handing over indexes to shipper"
loki_1  | level=info ts=2023-07-07T13:16:38.151560086Z caller=modules.go:913 msg="RulerStorage is nil.  Not starting the ruler."
loki_1  | level=info ts=2023-07-07T13:16:38.152550632Z caller=memberlist_client.go:563 msg="memberlist fast-join finished" joined_nodes=1 elapsed_time=6.804589ms
loki_1  | level=info ts=2023-07-07T13:16:38.152572235Z caller=memberlist_client.go:576 msg="joining memberlist cluster" join_members=192.168.131.5
loki_1  | level=info ts=2023-07-07T13:16:38.155661845Z caller=memberlist_client.go:595 msg="joining memberlist cluster succeeded" reached_nodes=1 elapsed_time=3.088971ms
loki_1  | level=info ts=2023-07-07T13:16:38.156445183Z caller=module_service.go:82 msg=initialising module=cache-generation-loader
loki_1  | level=info ts=2023-07-07T13:16:38.156649435Z caller=module_service.go:82 msg=initialising module=server
loki_1  | level=info ts=2023-07-07T13:16:38.1569543Z caller=module_service.go:82 msg=initialising module=memberlist-kv
loki_1  | level=info ts=2023-07-07T13:16:38.156971047Z caller=module_service.go:82 msg=initialising module=query-frontend-tripperware
loki_1  | level=info ts=2023-07-07T13:16:38.157000156Z caller=module_service.go:82 msg=initialising module=store
loki_1  | level=info ts=2023-07-07T13:16:38.157015081Z caller=module_service.go:82 msg=initialising module=ring
loki_1  | level=info ts=2023-07-07T13:16:38.157278894Z caller=module_service.go:82 msg=initialising module=usage-report
loki_1  | level=info ts=2023-07-07T13:16:38.157594761Z caller=module_service.go:82 msg=initialising module=ingester-querier
loki_1  | level=info ts=2023-07-07T13:16:38.157631503Z caller=module_service.go:82 msg=initialising module=query-scheduler
loki_1  | level=info ts=2023-07-07T13:16:38.157748355Z caller=module_service.go:82 msg=initialising module=ingester
loki_1  | level=info ts=2023-07-07T13:16:38.157779885Z caller=ingester.go:416 msg="recovering from checkpoint"
loki_1  | level=info ts=2023-07-07T13:16:38.157859348Z caller=recovery.go:40 msg="no checkpoint found, treating as no-op"
loki_1  | level=info ts=2023-07-07T13:16:38.157887928Z caller=module_service.go:82 msg=initialising module=compactor
loki_1  | level=info ts=2023-07-07T13:16:38.158004658Z caller=module_service.go:82 msg=initialising module=distributor
loki_1  | level=info ts=2023-07-07T13:16:38.158150298Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=f070fca0d234 ring=scheduler
loki_1  | level=info ts=2023-07-07T13:16:38.158166179Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty"
loki_1  | level=info ts=2023-07-07T13:16:38.158296238Z caller=basic_lifecycler.go:261 msg="instance not found in the ring" instance=f070fca0d234 ring=compactor
loki_1  | level=info ts=2023-07-07T13:16:38.158306588Z caller=basic_lifecycler_delegates.go:63 msg="not loading tokens from file, tokens file path is empty"
loki_1  | level=info ts=2023-07-07T13:16:38.158404337Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty"
loki_1  | level=info ts=2023-07-07T13:16:38.158439884Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=distributor
loki_1  | level=info ts=2023-07-07T13:16:38.158516059Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=distributor
loki_1  | level=info ts=2023-07-07T13:16:38.158599535Z caller=scheduler.go:616 msg="waiting until scheduler is JOINING in the ring"
loki_1  | level=info ts=2023-07-07T13:16:38.158608805Z caller=scheduler.go:620 msg="scheduler is JOINING in the ring"
loki_1  | level=info ts=2023-07-07T13:16:38.158634808Z caller=ingester.go:432 msg="recovered WAL checkpoint recovery finished" elapsed=870.404µs errors=false
loki_1  | level=info ts=2023-07-07T13:16:38.158646893Z caller=ingester.go:438 msg="recovering from WAL"
loki_1  | level=info ts=2023-07-07T13:16:38.158838854Z caller=compactor.go:332 msg="waiting until compactor is JOINING in the ring"
loki_1  | level=info ts=2023-07-07T13:16:38.158852383Z caller=compactor.go:336 msg="compactor is JOINING in the ring"
loki_1  | level=info ts=2023-07-07T13:16:38.158947777Z caller=ingester.go:454 msg="WAL segment recovery finished" elapsed=1.182505ms errors=false
loki_1  | level=info ts=2023-07-07T13:16:38.158975506Z caller=ingester.go:402 msg="closing recoverer"
loki_1  | level=info ts=2023-07-07T13:16:38.158983551Z caller=ingester.go:410 msg="WAL recovery finished" time=1.218303ms
loki_1  | level=info ts=2023-07-07T13:16:38.1590862Z caller=lifecycler.go:547 msg="not loading tokens from file, tokens file path is empty"
loki_1  | level=info ts=2023-07-07T13:16:38.159119686Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingester
loki_1  | level=info ts=2023-07-07T13:16:38.159180721Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingester
loki_1  | level=info ts=2023-07-07T13:16:38.159284046Z caller=wal.go:156 msg=started component=wal
loki_1  | level=info ts=2023-07-07T13:16:39.159485515Z caller=scheduler.go:630 msg="waiting until scheduler is ACTIVE in the ring"
loki_1  | level=info ts=2023-07-07T13:16:39.159561918Z caller=compactor.go:346 msg="waiting until compactor is ACTIVE in the ring"
loki_1  | level=info ts=2023-07-07T13:16:39.304400982Z caller=scheduler.go:634 msg="scheduler is ACTIVE in the ring"
loki_1  | level=info ts=2023-07-07T13:16:39.304559593Z caller=module_service.go:82 msg=initialising module=querier
loki_1  | level=info ts=2023-07-07T13:16:39.304681494Z caller=module_service.go:82 msg=initialising module=query-frontend
loki_1  | level=info ts=2023-07-07T13:16:39.350012943Z caller=compactor.go:350 msg="compactor is ACTIVE in the ring"
loki_1  | level=info ts=2023-07-07T13:16:39.350114689Z caller=loki.go:499 msg="Loki started"
loki_1  | level=info ts=2023-07-07T13:16:42.304726756Z caller=worker.go:209 msg="adding connection" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:42.304978899Z caller=worker.go:209 msg="adding connection" addr=172.24.0.2:9096
loki_1  | level=info ts=2023-07-07T13:16:42.305154213Z caller=scheduler.go:681 msg="this scheduler is in the ReplicationSet, will now accept requests."
loki_1  | level=warn ts=2023-07-07T13:16:42.30537579Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.305448064Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.305474724Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.305497485Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.305521656Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.309931757Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.309951449Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.309982695Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.310019222Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.31002135Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:42.854974353Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:43.071076051Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:43.194584818Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:43.197763075Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:43.230980038Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:44.351032414Z caller=compactor.go:411 msg="this instance has been chosen to run the compactor, starting compactor"
loki_1  | level=info ts=2023-07-07T13:16:44.351102122Z caller=compactor.go:440 msg="waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor"
loki_1  | level=warn ts=2023-07-07T13:16:44.407404152Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:44.496864173Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:44.589106732Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:44.739826828Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:44.956727298Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:46.97260903Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:47.017977569Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:47.129266801Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:47.65010412Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:47.862109851Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:49.30512854Z caller=frontend_scheduler_worker.go:107 msg="adding connection to scheduler" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:49.305662137Z caller=frontend_scheduler_worker.go:107 msg="adding connection to scheduler" addr=172.24.0.2:9096
loki_1  | level=error ts=2023-07-07T13:16:49.310310014Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.310325035Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.310366738Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.310381268Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.310399913Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.905660357Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:49.965643471Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:50.045886496Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:50.090359663Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:50.173398674Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:50.998725757Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:51.105953226Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:51.252293736Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:51.305408361Z caller=worker.go:222 msg="removing connection" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:51.305531634Z caller=scheduler_processor.go:78 msg="failed to notify querier shutdown to query-scheduler" address=172.25.0.4:9096 err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\""
loki_1  | level=error ts=2023-07-07T13:16:51.546049396Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:51.681070385Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:52.016912738Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:52.195309782Z caller=tcp_transport.go:252 component="memberlist TCPTransport" msg="failed to read message type" err=EOF remote=192.168.131.5:39922
loki_1  | level=error ts=2023-07-07T13:16:54.082327911Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:54.094452616Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:54.134778336Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:54.490750889Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:54.965656018Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:57.305079571Z caller=worker.go:209 msg="adding connection" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:57.305551355Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.24.0.2:9096
loki_1  | level=error ts=2023-07-07T13:16:57.305742148Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.24.0.2:9096
loki_1  | level=error ts=2023-07-07T13:16:57.305779264Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.24.0.2:9096
loki_1  | level=error ts=2023-07-07T13:16:57.305809663Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.24.0.2:9096
loki_1  | level=error ts=2023-07-07T13:16:57.30582532Z caller=scheduler_processor.go:106 msg="error processing requests from scheduler" err="rpc error: code = Canceled desc = context canceled" addr=172.24.0.2:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.310195851Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.310214015Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.310240666Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.310269753Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.310273877Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.832721452Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:57.911691317Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:58.133451755Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:58.14092973Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:58.263917599Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:58.395481137Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:58.39552909Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:58.429617029Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=warn ts=2023-07-07T13:16:59.057179259Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.208358426Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.4:9096: connect: network is unreachable\"" addr=172.25.0.4:9096
loki_1  | level=info ts=2023-07-07T13:16:59.30587544Z caller=frontend_scheduler_worker.go:107 msg="adding connection to scheduler" addr=172.25.0.3:9096
loki_1  | level=info ts=2023-07-07T13:16:59.306454086Z caller=frontend_scheduler_worker.go:134 msg="removing connection to scheduler" addr=172.25.0.4:9096
loki_1  | level=error ts=2023-07-07T13:16:59.311439619Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.311497798Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.311526951Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.311556093Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.311586435Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:59.609910985Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.836889093Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:59.871015382Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:16:59.898374829Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:16:59.988145953Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:00.089500629Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:00.119770336Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:00.159086054Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:00.215244917Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:01.524252009Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:01.71796485Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:01.836121712Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:01.900600261Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:01.921736104Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:02.1444932Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error whilloki_1  | level=warn ts=2023-07-07T13:17:02.223274834Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:02.766384496Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:02.852496982Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:03.120020195Z caller=scheduler_processor.go:98 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=info ts=2023-07-07T13:17:03.305022611Z caller=worker.go:222 msg="removing connection" addr=172.25.0.3:9096
loki_1  | level=warn ts=2023-07-07T13:17:03.305147115Z caller=scheduler_processor.go:78 msg="failed to notify querier shutdown to query-scheduler" address=172.25.0.3:9096 err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\""
loki_1  | level=error ts=2023-07-07T13:17:03.965759461Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:03.988215671Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
loki_1  | level=error ts=2023-07-07T13:17:04.357949642Z caller=frontend_scheduler_worker.go:237 msg="error contacting scheduler" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.25.0.3:9096: connect: network is unreachable\"" addr=172.25.0.3:9096
@fredrikcarlbom
Copy link
Author

fredrikcarlbom commented Jul 7, 2023

I expect this is somewhat related to #5610

@fredrikcarlbom
Copy link
Author

Changing the loki.yml to the following seems to work somewhat, now it successfully starts but ends up with "too many unhealthy instances in the ring"

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  ring:
    kvstore:
      store: memberlist
    instance_addr: ${ADVERTISE_ADDR}
  replication_factor: 1

ingester:
  wal:
    enabled: true
    dir: /tmp/wal
  lifecycler:
    final_sleep: 0s

memberlist:
  abort_if_cluster_join_fails: false
  join_members:
    - ${MEMBERLIST_MEMBER}
  advertise_addr: ${ADVERTISE_ADDR}

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    shared_store: s3
  aws:
    s3: http://${LOKI_USER}:${LOKI_PASSWORD}@${S3_URL}/${LOKI_BUCKET}
    s3forcepathstyle: true

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: aws

@fredrikcarlbom
Copy link
Author

I have still problems with my configuration but they do not seem to be related with how this bug is formulated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant