Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki 3.0.0: Query-Frontend crashes with SIGSEGV if no Query-Scheduler is used #13208

Open
Espe0n opened this issue Jun 13, 2024 · 0 comments
Open
Labels
type/bug Somehing is not working as expected

Comments

@Espe0n
Copy link

Espe0n commented Jun 13, 2024

Describe the bug
Dear Grafana-Loki-Team.

After Upgrading to Loki 3.0.0 we are facining a segmentation-fault in the Query-Frontend instance, whenever it received a query.
This Error only occurs in the Distributed Verison, when no Query-Scheduler is used.
This Issue seems to be related to #12270 or #12937

A workaround would be, to also deploy the Query-Scheduler, however this will lead to another currently unresolved Issue: #7649

Callstack

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x22f205f]

goroutine 279 [running]:
github.com/grafana/loki/v3/pkg/lokifrontend/frontend.downstreamRoundTripper.Do({0xc00062db00, {0x3425b40, 0x4b3b4c0}, {0x0, 0x0}}, {0x3449440, 0xc001b1a2d0}, {0x34661d0, 0xc0001193c0})
        /go/src/grafana/loki/pkg/lokifrontend/frontend/downstream_roundtripper.go:37 +0x9f
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.retry.Do({{0x34260c0, 0xc000aa60a0}, {0x34283c0, 0xc000bd9410}, 0x5, 0xc0008ffa50}, {0x3449440, 0xc001b1a2d0}, {0x34661d0, 0xc0001193c0})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/retry.go:86 +0x2b2
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3449440?, 0xc001b1a2d0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x43
github.com/grafana/dskit/instrument.CollectedRequest({0x3449440, 0xc001b1a240}, {0x2a99a02, 0x5}, {0x343e160, 0xc000135e78}, 0x40e3da?, 0xc001b222b8)
        /go/src/grafana/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x25d
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3449440?, 0xc001b1a240?}, {0x34661d0?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x4b081e0?, {0x3449440?, 0xc001b1a240?}, {0x34661d0?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsCacheMiddleware.NewResultsCacheMiddleware.func2.1({0x3449440?, 0xc001b1a240?}, {0x7fa8e61ae210?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:147 +0xc7
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.HandlerFunc.Do(0x224dbd3?, {0x3449440?, 0xc001b1a240?}, {0x7fa8e61ae210?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/storage/chunk/cache/resultscache/util.go:11 +0x37
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.ResultsCache.handleMiss({{0x34260c0, 0xc000aa60a0}, {0x34276a0, 0xc0019fa558}, {0x34497f8, 0xc000b18bf0}, {0x7fa8e61ae1d0, 0xc000bd8e10}, {0x3427d60, 0xc000bd8e40}, ...}, ...)
        /go/src/grafana/loki/pkg/storage/chunk/cache/resultscache/cache.go:159 +0x73
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.ResultsCache.Do({{0x34260c0, 0xc000aa60a0}, {0x34276a0, 0xc0019fa558}, {0x34497f8, 0xc000b18bf0}, {0x7fa8e61ae1d0, 0xc000bd8e10}, {0x3427d60, 0xc000bd8e40}, ...}, ...)
        /go/src/grafana/loki/pkg/storage/chunk/cache/resultscache/cache.go:144 +0x8e5
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.resultsCache.Do({0xc0005556b0, {0x34260c0, 0xc000aa60a0}, {0x3434400, 0xc000bb9740}, 0xc000bb7b60}, {0x3449440?, 0xc001b1a1e0?}, {0x34661d0, 0xc0001193c0})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:186 +0x149
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3449440?, 0xc001b1a1e0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x43
github.com/grafana/dskit/instrument.CollectedRequest({0x3449440, 0xc001b1a1b0}, {0x2abb19a, 0x11}, {0x343e160, 0xc000622f10}, 0x0?, 0xc001b22ac0)
        /go/src/grafana/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x25d
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3449440?, 0xc001b1a1b0?}, {0x34661d0?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00167b200?, {0x3449440?, 0xc001b1a1b0?}, {0x34661d0?, 0xc0001193c0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.(*splitByInterval).Do(0xc001633140, {0x3449440, 0xc001b1a1b0}, {0x34661d0, 0xc0001192a0})
        /go/src/grafana/loki/pkg/querier/queryrange/split_by_interval.go:214 +0x4b1
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3449440?, 0xc001b1a1b0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x43
github.com/grafana/dskit/instrument.CollectedRequest({0x3449440, 0xc001b1a180}, {0x2abb189, 0x11}, {0x343e160, 0xc000622f08}, 0x21cc475?, 0xc001dd0ea0)
        /go/src/grafana/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x25d
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3449440?, 0xc001b1a180?}, {0x34661d0?, 0xc0001192a0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x3449440?, 0xc001b1a180?}, {0x34661d0?, 0xc0001192a0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.limitsMiddleware.Do({{0x34787a8?, 0xc000bd8e10?}, {0x3425dc0?, 0xc00165ef00?}}, {0x3449440?, 0xc001b1a150?}, {0x34661d0, 0xc0001192a0})
        /go/src/grafana/loki/pkg/querier/queryrange/limits.go:199 +0xa91
github.com/grafana/loki/v3/pkg/querier/queryrange.StatsCollectorMiddleware.func1.1({0x3449478, 0xc000733720}, {0x34661d0, 0xc0001192a0})
        /go/src/grafana/loki/pkg/querier/queryrange/stats.go:132 +0x111
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc000119320?, {0x3449478?, 0xc000733720?}, {0x34661d0?, 0xc0001192a0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsTripperware.statsTripperware.func4.1({0x3449478, 0xc000733720}, {0x34661d0, 0xc0001192a0})
        /go/src/grafana/loki/pkg/querier/queryrange/roundtrip.go:970 +0xec
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc000aaee68?, {0x3449478?, 0xc000733720?}, {0x34661d0?, 0xc0001192a0?})
        /go/src/grafana/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.getStatsForMatchers.func1({0x3449478, 0xc000733720}, 0x0)
        /go/src/grafana/loki/pkg/querier/queryrange/shard_resolver.go:106 +0x282
github.com/grafana/dskit/concurrency.ForEachJob.func1()
        /go/src/grafana/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0x83
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /go/src/grafana/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 254
        /go/src/grafana/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x96

To Reproduce

  1. Started Loki 3.0.0 using the "loki-distributed"-Helm Chart
  2. For the Helm-Configuration, dont deploy the Query-Scheduler
  3. Use config
auth_enabled: false
chunk_store_config:
  chunk_cache_config:
    memcached:
      batch_size: 100
      expiration: 86400s
      parallelism: 100
    memcached_client:
      consistent_hash: true
      host: loki-poc-memcached-chunks.logtest.svc.cluster.local
      service: memcached-client
      timeout: 1000ms
common:
  compactor_address: http://loki-poc-compactor:3100
compactor:
  compaction_interval: 10m
  delete_request_store: s3
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
  retention_enabled: true
  working_directory: /var/loki/compactor
distributor:
  ring:
    kvstore:
      store: memberlist
frontend:
  compress_responses: true
  downstream_url: http://loki-poc-querier.logtest.svc.cluster.local:3100
  log_queries_longer_than: 5s
  max_outstanding_per_tenant: 200
ingester:
  chunk_encoding: snappy
  chunk_idle_period: 6h
  chunk_target_size: 1572864
  lifecycler:
    min_ready_duration: 0s
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  max_chunk_age: 6h
  wal:
    checkpoint_duration: 1m
    dir: /var/loki/wal
    enabled: true
    flush_on_shutdown: true
ingester_client:
  grpc_client_config:
    max_recv_msg_size: 67108864
  remote_timeout: 1s
limits_config:
  cardinality_limit: 1000000
  increment_duplicate_timestamp: true
  ingestion_burst_size_mb: 100
  ingestion_rate_mb: 64
  ingestion_rate_strategy: local
  max_cache_freshness_per_query: 10m
  max_global_streams_per_user: 0
  max_query_lookback: 30d
  max_query_parallelism: 40
  max_query_series: 10000
  max_streams_per_user: 0
  query_timeout: 3m
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  retention_period: 31d
  split_queries_by_interval: 60m
memberlist:
  abort_if_cluster_join_fails: false
  bind_port: 7946
  dead_node_reclaim_time: 1s
  join_members:
  - loki-poc-memberlist
querier:
  max_concurrent: 2
  query_ingesters_within: 7h
query_range:
  align_queries_with_step: true
  cache_results: true
  max_retries: 5
  parallelise_shardable_queries: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_items: 4096
        max_size_mb: 4096
        ttl: 24h
schema_config:
  configs:
  - chunks:
      period: 24h
      prefix: loki_chunk_
    from: "2023-07-22"
    index:
      period: 24h
      prefix: loki_index_
    object_store: s3
    row_shards: 64
    schema: v12
    store: tsdb
  - chunks:
      period: 24h
      prefix: loki_chunk_
    from: "2024-06-10"
    index:
      period: 24h
      prefix: loki_index_
    object_store: s3
    row_shards: 64
    schema: v13
    store: tsdb
server:
  grpc_listen_port: 9095
  grpc_server_max_concurrent_streams: 1000
  grpc_server_max_recv_msg_size: 33554432
  grpc_server_max_send_msg_size: 33554432
  http_listen_port: 3100
storage_config:
  aws: <redacted>
  tsdb_shipper:
    active_index_directory: /var/loki/tsdb-index
    cache_location: /var/loki/tsdb-cache
    index_gateway_client:
      server_address: dns:///loki-poc-index-gateway:9095
  1. Run a LogQL-Query using logcli query ... or Grafana

Expected behavior
No crash of the Query-Frontend-Pod

Environment:

  • Infrastructure: K8S v1.28.9
  • Deployment tool: helm
@JStickler JStickler added the type/bug Somehing is not working as expected label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

2 participants