You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The trace can be correctly queried in the first few minutes after it is generated. After about 5 minutes, tempo-query will display 404 Not Found.
After checking the trace of the tempo read path, we can also find that when the trace can be queried from the ingestor, the query can be successful. When entering the storage read path, query trace will return 404.
The trace valid duration is the same as ingester.complete_block_timeout.
Can you give some follow-up troubleshooting suggestions?
To Reproduce
Steps to reproduce the behavior:
Start Tempo (tempo, version e9892bd (branch: master, revision: e9892bd))
Perform Operations (Read/Write/Others)
Expected behavior
Trace can be fetched after ingestor flush blocks to storage.
Environment:
Infrastructure: Kubernetes
Deployment tool: manually
Additional Context
I use ceph as s3 storage backend.
I deployed tempo using microservice mode:
tempo-distributor: unstateful deployment
tempo-ingestor: statefulset
tempo-querier: statefulset
tempo-query: unstateful deployment
Full tempo.yml shared between all components.
auth_enabled: falseserver:
http_listen_port: 3100log_level: debugdistributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.jaeger: # the receives all come from the OpenTelemetry collector. more configuration information canprotocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiverthrift_http: #grpc: # for a production deployment you should only enable the receivers you need!thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
trace_idle_period: 10s# the length of time after a trace has not received spans to consider it complete and flush ittraces_per_block: 100# cut the head block when it his this number of traces or ...max_block_duration: 5m# this much time passeslifecycler:
ring:
kvstore:
store: etcdetcd:
endpoints:
- http://****:2379compactor:
compaction:
compaction_window: 1h# blocks in this time window will be compacted togethermax_compaction_objects: 1000000# maximum size of compacted blocksblock_retention: 1hcompacted_block_retention: 10mstorage:
trace:
backend: s3# backend configuration to usewal:
path: /data/tempo/wal # where to store the the wal locallybloom_filter_false_positive: .05# bloom filter false positive rate. lower values create larger filters but fewer false positivesindex_downsample: 10# number of traces per index records3:
bucket: ***endpoint: ***access_key: ****secret_key: ****insecure: truepool:
max_workers: 100# the worker pool mainly drives querying, but is also used for polling the blocklistqueue_depth: 10000
The text was updated successfully, but these errors were encountered:
Try setting your complete_block_timeout to be 10m. This way we should be able to guarantee that the querier is aware of a block by the time it is flushed from the ingester.
Currently the poll cycle and complete block timeout both default to 5m which should probably be changed.
If that doesn't work can you share the querier logs at the time you are executing the query? It is possible that the list operation is behaving differently then expected in ceph which is causing the querier to be unaware of the backend blocks.
Please also check the tempodb_blocklist_length metric as exposed by the querier and make sure that it matches the number of blocks in your backend.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.
Describe the bug
The trace can be correctly queried in the first few minutes after it is generated. After about 5 minutes, tempo-query will display 404 Not Found.
After checking the trace of the tempo read path, we can also find that when the trace can be queried from the ingestor, the query can be successful. When entering the storage read path, query trace will return 404.
The trace valid duration is the same as
ingester.complete_block_timeout
.Can you give some follow-up troubleshooting suggestions?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Trace can be fetched after ingestor flush blocks to storage.
Environment:
Additional Context
I use ceph as s3 storage backend.
I deployed tempo using microservice mode:
Full tempo.yml shared between all components.
The text was updated successfully, but these errors were encountered: