Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query error after upgrade to 1.0.0 #2396

Closed
AndreZiviani opened this issue Apr 2, 2020 · 4 comments · Fixed by #2400
Closed

Query error after upgrade to 1.0.0 #2396

AndreZiviani opened this issue Apr 2, 2020 · 4 comments · Fixed by #2400
Assignees

Comments

@AndreZiviani
Copy link
Contributor

Hi, I was using Cortex 0.7.0 with TSDB engine storing in AWS S3 Bucket without problems.

After upgrading to version 1.0.0 I got the error cannot iterate chunk for series when querying for values older than 1h, when I refresh the dashboard the error moves to another panel:
image
image

This is my current config (running in "single process" mode):

auth_enabled: true

server:
  log_level: info
  http_listen_port: 9009

  # Configure the server to allow messages up to 100MB.
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  grpc_server_max_concurrent_streams: 1000

distributor:
  shard_by_all_labels: true
  pool:
    health_check_ingesters: true
  ring:
    kvstore:
      store: consul

ingester_client:
  grpc_client_config:
    # Configure the client to allow messages up to 100MB.
    max_recv_msg_size: 104857600
    max_send_msg_size: 104857600
    use_gzip_compression: true

ingester:

  lifecycler:
    # The address to advertise for this ingester.  Will be autodiscovered by
    # looking up address on eth0 or en0; can be specified if this fails.
    address: 0.0.0.0

    # Use an in memory ring store, so we don't need to launch a Consul.
    ring:
      kvstore:
        store: consul
      replication_factor: 1
    num_tokens: 512
  
storage:
  engine: tsdb

tsdb:
  backend: s3
  dir: "/cortex/tsdb"
  bucket_store:
    sync_dir: "/cortex/tsdb-sync"
    sync_interval: 1h # default: 5m
    index_cache:
      backend: memcached
      memcached:
        addresses:[REDACTED]:11211
      postings_compression_enabled: true
  s3:
    bucket_name: [REDACTED]
    endpoint: s3.dualstack.us-east-1.amazonaws.com

limits:
  ingestion_rate: 250000

compactor:
  data_dir: "/cortex/compactor"

query_range:
  split_queries_by_interval: 24h

frontend:
  compress_responses: true
  log_queries_longer_than: 1s
@AndreZiviani
Copy link
Contributor Author

More info that could be usefull
image
image

@pracucci pracucci self-assigned this Apr 3, 2020
@pracucci
Copy link
Contributor

pracucci commented Apr 3, 2020

Thanks for reporting it. I can reproduce it as well and I confirm it's a bug (even if I haven't found the root cause). I'm working on it as top priority.

As a side note, the GET error rate may be unrelated and actually be a false positive (not an error) caused by a bug in the metrics tracking we're trying to fix here and/or here. I'm suspecting it because chunks are fetched using a get_range (for which I can't see any error).

@pracucci
Copy link
Contributor

pracucci commented Apr 3, 2020

The issue looks to be introduced by my change in the PR #2324 and is caused by the fact that chunks slices are reused in Thanos and we just keep references in Cortex instead of copying them. Sorry for that!

I'm working on a fix.

@AndreZiviani
Copy link
Contributor Author

Thanks for the quick fix! It worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants