Query error after upgrade to 1.0.0 #2396

AndreZiviani · 2020-04-02T17:20:46Z

Hi, I was using Cortex 0.7.0 with TSDB engine storing in AWS S3 Bucket without problems.

After upgrading to version 1.0.0 I got the error cannot iterate chunk for series when querying for values older than 1h, when I refresh the dashboard the error moves to another panel:

This is my current config (running in "single process" mode):

auth_enabled: true

server:
  log_level: info
  http_listen_port: 9009

  # Configure the server to allow messages up to 100MB.
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  grpc_server_max_concurrent_streams: 1000

distributor:
  shard_by_all_labels: true
  pool:
    health_check_ingesters: true
  ring:
    kvstore:
      store: consul

ingester_client:
  grpc_client_config:
    # Configure the client to allow messages up to 100MB.
    max_recv_msg_size: 104857600
    max_send_msg_size: 104857600
    use_gzip_compression: true

ingester:

  lifecycler:
    # The address to advertise for this ingester.  Will be autodiscovered by
    # looking up address on eth0 or en0; can be specified if this fails.
    address: 0.0.0.0

    # Use an in memory ring store, so we don't need to launch a Consul.
    ring:
      kvstore:
        store: consul
      replication_factor: 1
    num_tokens: 512
  
storage:
  engine: tsdb

tsdb:
  backend: s3
  dir: "/cortex/tsdb"
  bucket_store:
    sync_dir: "/cortex/tsdb-sync"
    sync_interval: 1h # default: 5m
    index_cache:
      backend: memcached
      memcached:
        addresses:[REDACTED]:11211
      postings_compression_enabled: true
  s3:
    bucket_name: [REDACTED]
    endpoint: s3.dualstack.us-east-1.amazonaws.com

limits:
  ingestion_rate: 250000

compactor:
  data_dir: "/cortex/compactor"

query_range:
  split_queries_by_interval: 24h

frontend:
  compress_responses: true
  log_queries_longer_than: 1s

The text was updated successfully, but these errors were encountered:

AndreZiviani · 2020-04-02T17:27:39Z

More info that could be usefull

pracucci · 2020-04-03T08:00:25Z

Thanks for reporting it. I can reproduce it as well and I confirm it's a bug (even if I haven't found the root cause). I'm working on it as top priority.

As a side note, the GET error rate may be unrelated and actually be a false positive (not an error) caused by a bug in the metrics tracking we're trying to fix here and/or here. I'm suspecting it because chunks are fetched using a get_range (for which I can't see any error).

pracucci · 2020-04-03T08:26:20Z

The issue looks to be introduced by my change in the PR #2324 and is caused by the fact that chunks slices are reused in Thanos and we just keep references in Cortex instead of copying them. Sorry for that!

I'm working on a fix.

AndreZiviani · 2020-04-03T12:28:06Z

Thanks for the quick fix! It worked!

pracucci self-assigned this Apr 3, 2020

pracucci mentioned this issue Apr 3, 2020

Fixed chunk data corruption when querying back series using the blocks storage #2400

Merged

3 tasks

pracucci closed this as completed in #2400 Apr 3, 2020

pracucci mentioned this issue Apr 3, 2020

Fixed response status code while iterating chunks in the blocks storage #2402

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query error after upgrade to 1.0.0 #2396

Query error after upgrade to 1.0.0 #2396

AndreZiviani commented Apr 2, 2020

AndreZiviani commented Apr 2, 2020

pracucci commented Apr 3, 2020

pracucci commented Apr 3, 2020

AndreZiviani commented Apr 3, 2020

Query error after upgrade to 1.0.0 #2396

Query error after upgrade to 1.0.0 #2396

Comments

AndreZiviani commented Apr 2, 2020

AndreZiviani commented Apr 2, 2020

pracucci commented Apr 3, 2020

pracucci commented Apr 3, 2020

AndreZiviani commented Apr 3, 2020