-
Notifications
You must be signed in to change notification settings - Fork 107
duplicate series: deduplicate + fix runtime normalization #1855
Conversation
it's trivial to demonstrate that before this change, a query like target=foo&target=foo results in duplicates in plan.Reqs, which leads to duplicates in ReqMap during executePlan().
apply COW when altering points slice during runtime normalization fix #1807
@@ -12,3 +16,17 @@ var pointSlicePool *sync.Pool | |||
func Pool(p *sync.Pool) { | |||
pointSlicePool = p | |||
} | |||
|
|||
// pointSlicePoolGet returns a pointslice of at least minCap capacity. | |||
// similar code lives also in api.Fix(). at some point we should really clean up our pool code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that the pool is created in the api package and passed into the expr package.
the original idea was to make the expr library reusable in different software, hence the need to pass the pool into it.
But it's starting to make more sense to just have one global pool singleton with a couple methods and directly access that everywhere. Out of scope for this PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I did a more in-depth (WIP) version of this here: master...bloomberg:fetch_resolve_opts
It handles the cases or Req being functionally identical for fetching. Not sure if it's worth it.
any chance this can be rebased to reduce the diff? I think the remaining possible deduplications all come with complications. e.g. trying to dedup reqs after the index lookups or after planRequests means having to go through many more reqs. possibly tens of thousands or more. lots of work for something that only rarely pays off. deduping the requested targets or on the AST (query subexpressions) has to process less data, at least. |
Co-authored-by: Sean Hanson <shanson7@bloomberg.net>
any further attempts would likely yield the same result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Probably with a bit of work, if we want to investigate that route.
True, but I think we are already doing a lot of logic that is iterating over the result series. If the lookup is quick deduping should be fast (especially when compared with fetching data from external sources). |
FWIW also in |
Can you give an example? The one that I see most frequently is something like |
You know I might just be quite off here. PS in your example foo would expand to multiple series on both targets right? So to avoid confusion, maybe better write it as |
Me neither. Especially with tag queries, it becomes difficult to detect the overlapping without instrumenting the code. At the time of writing the dedup series resolve code I didn't think it was worth it since the dedup was very inexpensive and I knew of a couple large use cases that would benefit.
I was just using the example from your comment, but yes
But PNGroup would need to be excluded from consideration after planning in the case that it didn't need to adjust the fetch interval. |
In this PR, I introduce a fairly trivial deduplication optimization, followed by a good chunk of developer documentation to clarify our use of request types and where the sources of duplication and reuse are (as well as opportunities for deduplication), finally arriving at a clearer, more precise explanation of the input data reuse (which all boils down to the dataMap), and finally fixing bug 1807.
fix #1807