Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki: Series API will return all series with no match or empty matcher #2254

Merged
merged 3 commits into from
Jun 26, 2020

Conversation

slim-bean
Copy link
Collaborator

The series API can be an important tool for inspecting Loki usage and determining which series may be contributing to high cardinality problems.

In analytic uses cases such as this it's necessary to be able to return all series within a timeframe without supplying a matcher (or supply an empty matcher {})

…return all series for the provided timeframe.
dedupedSeries[key] = logproto.SeriesIdentifier{
// If no matchers were supplied we include all streams.
if len(groups) == 0 {
series = make([]logproto.SeriesIdentifier, 0, len(i.streams))
Copy link
Collaborator Author

@slim-bean slim-bean Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocation isn't protected by a mutex so it could be wrong but I think this would generally be ok, worst case a stream is added between the creation of this array and it's population causing Go to have to double its size. I'm thinking this is ok but am open to alternatives or different opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem here from my point of view. But if you're not sure you can write a test that does 2 query in different goroutine and run that test using -race.

go test -mod=vendor -race ./pkg/ingester.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could have kept the same function func(stream *stream) error for both cases.`

Comment on lines +157 to +162
from, through = util.RoundToMilliseconds(req.Start, req.End)
nameLabelMatcher, err := labels.NewMatcher(labels.MatchEqual, labels.MetricName, "logs")
if err != nil {
return nil, err
}
matchers = []*labels.Matcher{nameLabelMatcher}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have here is this shortcut also skips the shard aware stuff in decodeReq, currently the series API is not shard aware so this is fine but I'm not sure if there is something else we could do here to protect future us from missing this. See #2167

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to get the part that parses the string matcher out and pass the matchers as an array to decodeReq.

@adityacs
Copy link
Contributor

Should we not enforce some limits like max_series_per_query along with this? Otherwise when querying over a large time range which has large volumes of chunks we might get OOM.

@slim-bean
Copy link
Collaborator Author

Should we not enforce some limits like max_series_per_query along with this? Otherwise when querying over a large time range which has large volumes of chunks we might get OOM.

This is a good thought @adityacs, I went through the code as written today and I think we are reasonably well protected from OOM, we download the chunks in batches for large ranges, and are only ever downloading one chunk per series.

I think the only possibility of OOM would come from the actual result being built of the series labels, but it would have to be a pretty massive result for this to happen and I think is pretty unlikely.

@codecov-commenter
Copy link

Codecov Report

Merging #2254 into master will decrease coverage by 0.00%.
The diff coverage is 78.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2254      +/-   ##
==========================================
- Coverage   62.32%   62.32%   -0.01%     
==========================================
  Files         158      158              
  Lines       12718    12762      +44     
==========================================
+ Hits         7927     7954      +27     
- Misses       4183     4195      +12     
- Partials      608      613       +5     
Impacted Files Coverage Δ
pkg/loghttp/params.go 93.10% <ø> (-0.16%) ⬇️
pkg/storage/store.go 65.27% <38.46%> (-1.89%) ⬇️
pkg/ingester/instance.go 62.24% <86.20%> (+1.60%) ⬆️
pkg/querier/querier.go 71.33% <86.36%> (-0.64%) ⬇️
pkg/loghttp/series.go 93.33% <100.00%> (+1.33%) ⬆️
pkg/promtail/targets/tailer.go 76.13% <0.00%> (-2.28%) ⬇️

@adityacs
Copy link
Contributor

@slim-bean Yeah, makes sense

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slim-bean slim-bean merged commit aee064d into master Jun 26, 2020
@slim-bean slim-bean deleted the series-api-all-series branch June 26, 2020 15:44
@jiayi-1994
Copy link

@slim-bean loki2.6.1 or loki 2.7.1 Series API will return all series with no match or empty matcher #8034

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants