Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max query length error to errors catalog #1939

Merged
merged 3 commits into from
May 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
- `-querier.query-store-after`
* [CHANGE] Config flag category overrides can be set dynamically at runtime. #1934
* [ENHANCEMENT] Store-gateway: Add the experimental ability to run requests in a dedicated OS thread pool. This feature can be configured using `-store-gateway.thread-pool-size` and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812
* [ENHANCEMENT] Improved error messages to make them easier to understand and referencing a unique global identifier that can be looked up in the runbooks. #1907 #1919 #1888
* [ENHANCEMENT] Improved error messages to make them easier to understand; each now have a unique, global identifier that you can use to look up in the runbooks for more information. #1907 #1919 #1888 #1939
* [ENHANCEMENT] Memberlist KV: incoming messages are now processed on per-key goroutine. This may reduce loss of "maintanance" packets in busy memberlist installations, but use more CPU. New `memberlist_client_received_broadcasts_dropped_total` counter tracks number of dropped per-key messages. #1912
* [ENHANCEMENT] Blocks Storage, Alertmanager, Ruler: add support a prefix to the bucket store (`*_storage.storage_prefix`). This enables using the same bucket for the three components. #1686
* [BUGFIX] Fix regexp parsing panic for regexp label matchers with start/end quantifiers. #1883
Expand Down
3 changes: 2 additions & 1 deletion integration/query_frontend_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import (

"github.com/grafana/mimir/integration/ca"
"github.com/grafana/mimir/integration/e2emimir"
"github.com/grafana/mimir/pkg/util/validation"
)

type queryFrontendTestConfig struct {
Expand Down Expand Up @@ -492,7 +493,7 @@ overrides:
return c.QueryRangeRaw(`sum_over_time(metric[31d:1s])`, now.Add(-time.Minute), now, time.Minute)
},
expStatusCode: http.StatusUnprocessableEntity,
expBody: `{"error":"expanding series: the query time range exceeds the limit (query length: 744h6m0s, limit: 720h0m0s)", "errorType":"execution", "status":"error"}`,
expBody: fmt.Sprintf(`{"error":"expanding series: %s", "errorType":"execution", "status":"error"}`, validation.NewMaxQueryLengthError((744*time.Hour)+(6*time.Minute), 720*time.Hour)),
},
{
name: "execution error",
Expand Down
15 changes: 15 additions & 0 deletions operations/mimir-mixin/docs/playbooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -1282,6 +1282,21 @@ How to **fix** it:
- Consider reducing the time range and/or cardinality of the query. To reduce the cardinality of the query, you can add more label matchers to the query, restricting the set of matching series.
- Consider increasing the per-tenant limit by using the `-querier.max-fetched-chunk-bytes-per-query` option (or `max_fetched_chunk_bytes_per_query` in the runtime configuration).

### err-mimir-max-query-length

This error occurs when the time range of a query exceeds the configured maximum length.

Both PromQL instant and range queries can fetch metrics data over a period of time.
A [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) requires a `start` and `end` timestamp, so the difference of `end` minus `start` is the time range length of the query.
An [instant query](https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries) requires a `time` parameter and the query is executed fetching samples at that point in time.
However, even an instant query can fetch metrics data over a period of time by using the [range vector selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors).
For example, the instant query `sum(rate(http_requests_total{job="prometheus"}[1h]))` fetches metrics over a 1 hour period.
This time period is what Grafana Mimir calls the _query time range length_ (or _query length_).

Mimir has a limit on the query length.
This limit is applied to partial queries, after they've split (according to time) by the query-frontend. This limit protects the system’s stability from potential abuse or mistakes.
You can configure the limit on a per-tenant basis by using the `-store.max-query-length` option (or `max_query_length` in the runtime configuration).

## Mimir routes by path

**Write path**:
Expand Down
2 changes: 1 addition & 1 deletion pkg/frontend/querymiddleware/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ func (l limitsMiddleware) Do(ctx context.Context, r Request) (Response, error) {
if maxQueryLength := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.MaxQueryLength); maxQueryLength > 0 {
queryLen := timestamp.Time(r.GetEnd()).Sub(timestamp.Time(r.GetStart()))
if queryLen > maxQueryLength {
return nil, apierror.Newf(apierror.TypeBadData, validation.ErrQueryTooLong, queryLen, maxQueryLength)
return nil, apierror.New(apierror.TypeBadData, validation.NewMaxQueryLengthError(queryLen, maxQueryLength).Error())
}
}

Expand Down
5 changes: 3 additions & 2 deletions pkg/frontend/querymiddleware/querysharding_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import (
"github.com/grafana/mimir/pkg/mimirpb"
"github.com/grafana/mimir/pkg/storage/sharding"
"github.com/grafana/mimir/pkg/util"
"github.com/grafana/mimir/pkg/util/validation"
)

var (
Expand Down Expand Up @@ -1142,7 +1143,7 @@ func TestQuerySharding_ShouldReturnErrorInCorrectFormat(t *testing.T) {
return nil, httpgrpc.ErrorFromHTTPResponse(&httpgrpc.HTTPResponse{Code: http.StatusInternalServerError, Body: []byte("fatal queryable error")})
})
queryablePrometheusExecErr = storage.QueryableFunc(func(ctx context.Context, mint, maxt int64) (storage.Querier, error) {
return nil, apierror.New(apierror.TypeExec, "expanding series: the query time range exceeds the limit (query length: 744h6m0s, limit: 720h0m0s")
return nil, apierror.Newf(apierror.TypeExec, "expanding series: %s", validation.NewMaxQueryLengthError(744*time.Hour, 720*time.Hour))
Comment on lines -1145 to +1146
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to these 6m?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a mocked error. Whatever value we put here is fine. The only purpose is to assert below that we receive the same error. I remove the 6m for simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, 👍

})
queryable = storageSeriesQueryable([]*promql.StorageSeries{
newSeries(labels.Labels{{Name: "__name__", Value: "bar1"}}, start.Add(-lookbackDelta), end, step, factor(5)),
Expand Down Expand Up @@ -1194,7 +1195,7 @@ func TestQuerySharding_ShouldReturnErrorInCorrectFormat(t *testing.T) {
engineDownstream: engine,
engineSharding: engineSampleLimit,
queryable: queryablePrometheusExecErr,
expError: apierror.New(apierror.TypeExec, "expanding series: the query time range exceeds the limit (query length: 744h6m0s, limit: 720h0m0s"),
expError: apierror.Newf(apierror.TypeExec, "expanding series: %s", validation.NewMaxQueryLengthError(744*time.Hour, 720*time.Hour)),
},
} {
t.Run(tc.name, func(t *testing.T) {
Expand Down
3 changes: 1 addition & 2 deletions pkg/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,7 @@ func (q querier) Select(_ bool, sp *storage.SelectHints, matchers ...*labels.Mat

// Validate query time range.
if maxQueryLength := q.limits.MaxQueryLength(userID); maxQueryLength > 0 && endTime.Sub(startTime) > maxQueryLength {
limitErr := validation.LimitError(fmt.Sprintf(validation.ErrQueryTooLong, endTime.Sub(startTime), maxQueryLength))
return storage.ErrSeriesSet(limitErr)
return storage.ErrSeriesSet(validation.NewMaxQueryLengthError(endTime.Sub(startTime), maxQueryLength))
}

if len(q.queriers) == 1 {
Expand Down
4 changes: 2 additions & 2 deletions pkg/querier/querier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -471,13 +471,13 @@ func TestQuerier_ValidateQueryTimeRange_MaxQueryLength(t *testing.T) {
query: "rate(foo[31d])",
queryStartTime: time.Now().Add(-time.Hour),
queryEndTime: time.Now(),
expected: errors.New("expanding series: the query time range exceeds the limit (query length: 745h0m0s, limit: 720h0m0s)"),
expected: errors.Errorf("expanding series: %s", validation.NewMaxQueryLengthError(745*time.Hour, 720*time.Hour)),
},
"should forbid query on large time range over the limit and short rate time window": {
query: "rate(foo[1m])",
queryStartTime: time.Now().Add(-maxQueryLength).Add(-time.Hour),
queryEndTime: time.Now(),
expected: errors.New("expanding series: the query time range exceeds the limit (query length: 721h1m0s, limit: 720h0m0s)"),
expected: errors.Errorf("expanding series: %s", validation.NewMaxQueryLengthError((721*time.Hour)+time.Minute, 720*time.Hour)),
},
}

Expand Down
2 changes: 2 additions & 0 deletions pkg/util/globalerror/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ const (
MetricMetadataMetricNameTooLong ID = "metric-name-too-long"
MetricMetadataHelpTooLong ID = "help-too-long"
MetricMetadataUnitTooLong ID = "unit-too-long"

MaxQueryLength ID = "max-query-length"
)

// Message returns the provided msg, appending the error id.
Expand Down
7 changes: 7 additions & 0 deletions pkg/util/validation/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package validation
import (
"fmt"
"strings"
"time"

"github.com/prometheus/common/model"

Expand Down Expand Up @@ -268,6 +269,12 @@ func newMetadataUnitTooLongError(metadata *mimirpb.MetricMetadata) ValidationErr
}
}

func NewMaxQueryLengthError(actualQueryLen, maxQueryLength time.Duration) LimitError {
return LimitError(globalerror.MaxQueryLength.MessageWithLimitConfig(
maxQueryLengthFlag,
fmt.Sprintf("the query time range exceeds the limit (query length: %s, limit: %s)", actualQueryLen, maxQueryLength)))
}

// formatLabelSet formats label adapters as a metric name with labels, while preserving
// label order, and keeping duplicates. If there are multiple "__name__" labels, only
// first one is used as metric name, other ones will be included as regular labels.
Expand Down
6 changes: 6 additions & 0 deletions pkg/util/validation/errors_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package validation

import (
"testing"
"time"

"github.com/stretchr/testify/assert"

Expand All @@ -29,3 +30,8 @@ func TestNewMetadataUnitTooLongError(t *testing.T) {
err := newMetadataUnitTooLongError(&mimirpb.MetricMetadata{MetricFamilyName: "test_metric", Unit: "counter", Help: "This is a test metric."})
assert.Equal(t, "received a metric metadata whose unit name length exceeds the limit, unit: 'counter' metric name: 'test_metric' (err-mimir-unit-too-long). You can adjust the related per-tenant limit by configuring -validation.max-metadata-length, or by contacting your service administrator.", err.Error())
}

func TestNewMaxQueryLengthError(t *testing.T) {
err := NewMaxQueryLengthError(time.Hour, time.Minute)
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-query-length). You can adjust the related per-tenant limit by configuring -store.max-query-length, or by contacting your service administrator.", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-query-length). You can adjust the related per-tenant limit by configuring -store.max-query-length, or by contacting your service administrator.", err.Error())
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-query-length). You can adjust the related per-tenant limit by configuring `-store.max-query-length`, or by contacting your service administrator.", err.Error())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping this because won't be rendered as we expect in the generated doc (the whole config doc is already within a code block).

}
18 changes: 9 additions & 9 deletions pkg/util/validation/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,19 @@ import (
)

const (
MaxSeriesPerMetricFlag = "ingester.max-global-series-per-metric"
MaxMetadataPerMetricFlag = "ingester.max-global-metadata-per-metric"
MaxSeriesPerUserFlag = "ingester.max-global-series-per-user"
MaxMetadataPerUserFlag = "ingester.max-global-metadata-per-user"
MaxChunksPerQueryFlag = "querier.max-fetched-chunks-per-query"
MaxChunkBytesPerQueryFlag = "querier.max-fetched-chunk-bytes-per-query"
MaxSeriesPerQueryFlag = "querier.max-fetched-series-per-query"

MaxSeriesPerMetricFlag = "ingester.max-global-series-per-metric"
MaxMetadataPerMetricFlag = "ingester.max-global-metadata-per-metric"
MaxSeriesPerUserFlag = "ingester.max-global-series-per-user"
MaxMetadataPerUserFlag = "ingester.max-global-metadata-per-user"
MaxChunksPerQueryFlag = "querier.max-fetched-chunks-per-query"
MaxChunkBytesPerQueryFlag = "querier.max-fetched-chunk-bytes-per-query"
MaxSeriesPerQueryFlag = "querier.max-fetched-series-per-query"
maxLabelNamesPerSeriesFlag = "validation.max-label-names-per-series"
maxLabelNameLengthFlag = "validation.max-length-label-name"
maxLabelValueLengthFlag = "validation.max-length-label-value"
maxMetadataLengthFlag = "validation.max-metadata-length"
creationGracePeriodFlag = "validation.create-grace-period"
maxQueryLengthFlag = "store.max-query-length"
)

// LimitError are errors that do not comply with the limits specified.
Expand Down Expand Up @@ -171,7 +171,7 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&l.MaxChunksPerQuery, MaxChunksPerQueryFlag, 2e6, "Maximum number of chunks that can be fetched in a single query from ingesters and long-term storage. This limit is enforced in the querier, ruler and store-gateway. 0 to disable.")
f.IntVar(&l.MaxFetchedSeriesPerQuery, MaxSeriesPerQueryFlag, 0, "The maximum number of unique series for which a query can fetch samples from each ingesters and storage. This limit is enforced in the querier and ruler. 0 to disable")
f.IntVar(&l.MaxFetchedChunkBytesPerQuery, MaxChunkBytesPerQueryFlag, 0, "The maximum size of all chunks in bytes that a query can fetch from each ingester and storage. This limit is enforced in the querier and ruler. 0 to disable.")
f.Var(&l.MaxQueryLength, "store.max-query-length", "Limit the query time range (end - start time). This limit is enforced in the query-frontend (on the received query), in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.")
f.Var(&l.MaxQueryLength, maxQueryLengthFlag, "Limit the query time range (end - start time). This limit is enforced in the query-frontend (on the received query), in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.")
f.Var(&l.MaxQueryLookback, "querier.max-query-lookback", "Limit how long back data (series and metadata) can be queried, up until <lookback> duration ago. This limit is enforced in the query-frontend, querier and ruler. If the requested time range is outside the allowed range, the request will not fail but will be manipulated to only query data within the allowed time range. 0 to disable.")
f.IntVar(&l.MaxQueryParallelism, "querier.max-query-parallelism", 14, "Maximum number of split (by time) or partial (by shard) queries that will be scheduled in parallel by the query-frontend for a single input query. This limit is introduced to have a fairer query scheduling and avoid a single query over a large time range saturating all available queriers.")
f.Var(&l.MaxLabelsQueryLength, "store.max-labels-query-length", "Limit the time range (end - start time) of series, label names and values queries. This limit is enforced in the querier. If the requested time range is outside the allowed range, the request will not fail but will be manipulated to only query data within the allowed time range. 0 to disable.")
Expand Down
3 changes: 0 additions & 3 deletions pkg/util/validation/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,6 @@ import (
const (
discardReasonLabel = "reason"

// ErrQueryTooLong is used in chunk store, querier and query frontend.
ErrQueryTooLong = "the query time range exceeds the limit (query length: %s, limit: %s)"

// RateLimited is one of the values for the reason to discard samples.
// Declared here to avoid duplication in ingester and distributor.
RateLimited = "rate_limited"
Expand Down