You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The query trace by id feature in tempo searches all blocks in blocklist to find relevant blocks and merge trace objects found in these blocks. However, in environments that have high loads, even with compactors, the blocklist can extend upto thousands like 18000 etc, blocks and parsing through each of them affects performance. The current problem that we see is, each block has minId and maxId spanning across the maximum ranges which makes all blocks in blocklist an eligible candidate for parsing for a given trace id.
Describe the solution you'd like
We have come up with a time based search approach, that will include blocks for search based on the start time and end time of the blocks. The start time for the query is retrieved based on tags search. Default behaviour remains same if the time window is not specified.
Currently, in order to use the trace by id query, we have an elastic search deployed that gives us the trace ids based on specific tags or attributes. These objects also return the time window for the traces. We use the same window to query trace object by id so that the blocks outside the window are excluded and hence, the latency decreases and performance is better.
Describe alternatives you've considered
We have tried experimenting with compactor to merge only contiguous blocks so that we can restrict the minId and maxId of compacted blocks, as opposed to the current implementation of sorting by block age but we have not been able to scale using this approach. So, we think the alternate way to improve the query performance is to restrict the search for given time windows.
I really like this idea and have considered adding into Tempo before. I think the real challenge will be getting Grafana support. For some trace by id queries a time range could be retrieved from a logs instance, but for others we would still need to do the full block range search.
If this looks fine, I will time this appropriately as per your advice for the other PR for Tempo that is in progress and other release timelines that may be adhered to.
Is your feature request related to a problem? Please describe.
The query trace by id feature in tempo searches all blocks in blocklist to find relevant blocks and merge trace objects found in these blocks. However, in environments that have high loads, even with compactors, the blocklist can extend upto thousands like 18000 etc, blocks and parsing through each of them affects performance. The current problem that we see is, each block has minId and maxId spanning across the maximum ranges which makes all blocks in blocklist an eligible candidate for parsing for a given trace id.
Describe the solution you'd like
We have come up with a time based search approach, that will include blocks for search based on the start time and end time of the blocks. The start time for the query is retrieved based on tags search. Default behaviour remains same if the time window is not specified.
Currently, in order to use the trace by id query, we have an elastic search deployed that gives us the trace ids based on specific tags or attributes. These objects also return the time window for the traces. We use the same window to query trace object by id so that the blocks outside the window are excluded and hence, the latency decreases and performance is better.
Describe alternatives you've considered
We have tried experimenting with compactor to merge only contiguous blocks so that we can restrict the minId and maxId of compacted blocks, as opposed to the current implementation of sorting by block age but we have not been able to scale using this approach. So, we think the alternate way to improve the query performance is to restrict the search for given time windows.
Additional context
The API spec would now be /api/traces/{traceID}?timeEnd={end}&timeStart={start}
A snapshot of the code changes would be : https://github.com/grafana/tempo/compare/v1.3.2...sagarwala:v1.3.2-branch?expand=1
The text was updated successfully, but these errors were encountered: