-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce query planning allocations #7385
Conversation
7054c38
to
f3388a9
Compare
0e5219b
to
479eda6
Compare
return nil, err | ||
} | ||
|
||
// For every series, get the tag values for the requested tag keys i.e. dimensions. This is the | ||
// TagSet for that series. Series with the same TagSet are then grouped together, because for the | ||
// purpose of GROUP BY they are part of the same composite series. | ||
tagSets := make(map[string]*influxql.TagSet) | ||
tagSets := make(map[string]*influxql.TagSet, len(filters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may end up overallocating. While this performs well if you have GROUP BY *
, I imagine it would perform poorly when there is no GROUP BY
because everything maps to the same tag set. So you could have 10,000 series ids, but only need one tag set and this would allocate a map that was ready for storing 10,000 unique entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test my datasets w/ queries w/o group by
. The problem it has without it as that it almost always has to re-allocate and grow. Maybe a smaller default of of 1024
or would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This certainly needs to change in some way and the AddFilter
below also needs to. I'm just not sure the proper way to do it.
} | ||
|
||
// Associate the series and filter with the Tagset. | ||
tagSet.AddFilter(m.seriesByID[id].Key, filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my previous comment, this is a hot-zone I found while I was working on lazy iterators. When no GROUP BY *
is used, a lot of time is spent growing the slice inside of the tag set. I have not figured out a good way to deal with this since attempting to count the number of elements ahead of time didn't really reduce the amount of time.
tagsAsKey := string(MarshalTags(tags)) | ||
tagSet, ok := tagSets[tagsAsKey] | ||
tagsAsKey := MarshalTags(tags) | ||
tagSet, ok := tagSets[string(tagsAsKey)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember. Does this cause the map to use some kind of optimization for memory allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposed to. But it reduces the []byte{}
-> string
conversion from 2 -> 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsternberg yeah this approach is significantly than the previous approach. Around 15ns
per lookup versus 45ns
from some recent benchmarking I did.
|
||
// Sort the series in each tag set. | ||
for _, t := range tagSets { | ||
sort.Sort(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this, while it increases speed by a lot, will cause results to be returned non-deterministically when we use things like LIMIT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort is moved to here when necessary: https://github.com/influxdata/influxdb/pull/7385/files#diff-82e4e65bd72f00208a49c4797d466ecdR1164
@@ -1161,6 +1161,12 @@ func (e *Engine) createVarRefIterator(opt influxql.IteratorOptions, aggregate bo | |||
return err | |||
} | |||
|
|||
if opt.Limit > 0 || opt.Offset > 0 || opt.SLimit > 0 || opt.SOffset > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still needed even when these conditions aren't met. Otherwise, the output is still non-deterministic for queries without a limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag sets are ordered by key still. This just order the series keys within each tagset.
Do you know of a query that could have this non-determinism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tag sets are ordered by the returned series. Internal to a series wouldn't be ordered so if you had this:
cpu,host=server01,region=uswest value=0
cpu,host=server01,region=useast value=1
cpu,host=server02 value=2
> SELECT * FROM cpu GROUP BY host
The host would be ordered, but the internal series grouped into the server01
bucket would not be deterministic. This is why it affects LIMIT <n>
and OFFSET <n>
. While I'm not sure it matters, I assume we still want that to be deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the sort removal so it will always sort as before. It seems like it is unnecessary in some cases though. For example,
SELECT SUM(value) FROM cpu GROUP BY host
Should not need to sort. Is there a way to detect this scenario here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is. We also may want to remove the sort here and instead perform the sort when we're using the tag set to create the iterators inside of the query engine.
The TagSets function was creating a lot of intermediate maps and slices to calculate the sorted tag sets. It first creates a map to group tag sets with their series, it then created an equally sized slice of the tag keys and sorted then. Finally, it created a new slice and added the tag sets in the original map by the ordering of the sorted keys. It was also recreating the tags map multiple time creating extra garbage in the loop. This simplifies the code to create one map for grouping and than adding the distinct sets to a slice which is then sorted. It also fixes the multple tag maps getting created.
479eda6
to
68dd312
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Required for all non-trivial PRs
This PR reduces allocations during query planning. The main change is to the
Measurement.TagSets
which was creating a lot of garbage.The
TagSets
function was creating a lot of intermediate maps andslices to calculate the sorted tag sets. It first creates a map
to group tag sets with their series, it then created an equally
sized slice of the tag keys and sorted them. Finally, it created
a new slice and added the tag sets in the original map by the ordering
of the sorted keys. It was also recreating the tags map multiple time
creating extra garbage in the loop.
This simplifies the code to create one map for grouping and than adding
the distinct sets to a slice which is then sorted. It also fixes the
multple tag maps getting created.
This could probably be optimized further using a ordered tree data structure.
1.0
1.1 master
cc @jsternberg