Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces cache to TSDB postings #9621

Merged
merged 96 commits into from
Aug 3, 2023
Merged
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
bc286f8
hacky cached postings
DylanGuedes Jun 4, 2023
da40b40
Change signatures
DylanGuedes Jun 5, 2023
2753a89
tmp inherit thanos indexcache code
DylanGuedes Jun 7, 2023
64a0af7
make caching configurable
DylanGuedes Jun 8, 2023
e37da4d
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 12, 2023
369869f
Implement LRU as a possible cache option
DylanGuedes Jun 12, 2023
6878e4b
Add tests
DylanGuedes Jun 12, 2023
b5be8fe
delete indexcache folder
DylanGuedes Jun 12, 2023
f795168
undo change
DylanGuedes Jun 12, 2023
e9d2650
trim down code
DylanGuedes Jun 12, 2023
1ff0663
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 12, 2023
3b5ab8f
Use postingsclient.
DylanGuedes Jun 12, 2023
317a879
Use right cache name.
DylanGuedes Jun 15, 2023
66c7f0d
Rename flag and remove unused var.
DylanGuedes Jun 15, 2023
71cac4e
Make the metrics more consistent.
DylanGuedes Jun 15, 2023
bf330fc
Pass canonical keys directly.
DylanGuedes Jun 15, 2023
4a4d1fd
Pass ctx directly.
DylanGuedes Jun 15, 2023
88fe691
Reuse ctx
DylanGuedes Jun 15, 2023
604de2f
Append series numbers to encoded binary
DylanGuedes Jun 15, 2023
99268de
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 19, 2023
4e784f3
Rename to postingsReader
DylanGuedes Jun 19, 2023
a7d48a8
lint fix
DylanGuedes Jun 19, 2023
bba1de5
Implement overflow logic
DylanGuedes Jun 19, 2023
03d0cc8
Fix tests
DylanGuedes Jun 19, 2023
a98cfea
Finish fixing tests.
DylanGuedes Jun 19, 2023
7f4e582
Fix lint
DylanGuedes Jun 20, 2023
8c71ebf
Fix lint
DylanGuedes Jun 20, 2023
64ad6ca
Rename client->reader
DylanGuedes Jun 20, 2023
e9ef21a
Update calls (haudi suggestions)
DylanGuedes Jun 20, 2023
68361f3
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 20, 2023
84f68df
Fix tests
DylanGuedes Jun 20, 2023
210ac07
Make sure cache is used on tests
DylanGuedes Jun 22, 2023
eef25e5
remvoe consts only used by mimir
DylanGuedes Jun 22, 2023
9b902d1
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 22, 2023
d27286c
appease lint (remove TSDB from struct name)
DylanGuedes Jun 22, 2023
1e6fd33
fix lint
DylanGuedes Jun 22, 2023
2030224
fix import order
DylanGuedes Jun 22, 2023
33521a1
fix import order again
DylanGuedes Jun 22, 2023
52ca301
encode with 32b instead of 64b (tsdb uses 32b internally)
DylanGuedes Jun 22, 2023
2adc870
Use "," to separate matchers.
DylanGuedes Jun 23, 2023
423692b
better defaults
DylanGuedes Jun 23, 2023
474261d
bugged linter
DylanGuedes Jun 23, 2023
2baf30d
fix docs
DylanGuedes Jun 23, 2023
110eac7
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 23, 2023
571c6a8
fix config type
DylanGuedes Jun 23, 2023
4139abd
Register flags
DylanGuedes Jun 23, 2023
3d30b1e
wrap error on vendor
DylanGuedes Jun 23, 2023
7d9a24b
test other thing
DylanGuedes Jun 23, 2023
7edc7bd
wrap errors
DylanGuedes Jun 23, 2023
7d7c827
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jun 23, 2023
8c9fcef
careful wrapping
DylanGuedes Jun 23, 2023
5521d82
wrap at different place
DylanGuedes Jun 23, 2023
0ef38b7
wrap different
DylanGuedes Jun 23, 2023
dcdf6d2
use sharded postings?
DylanGuedes Jun 23, 2023
6c36767
sanity check
DylanGuedes Jun 23, 2023
b84e257
reset postings by calling PostingsForMatcher again.
DylanGuedes Jun 26, 2023
5ff3a36
sanity check
DylanGuedes Jun 26, 2023
d9354f5
Try calling PostingsForMatchers after cache hit too.
DylanGuedes Jun 26, 2023
fb9b541
another sanity check
DylanGuedes Jun 26, 2023
d3a2a5b
debug decoded/encoded series
DylanGuedes Jun 26, 2023
cd0d3a9
was my decoding wrong?
DylanGuedes Jun 26, 2023
f7d8013
cleanup
DylanGuedes Jun 26, 2023
fe659c8
cleanup cached postings file.
DylanGuedes Jun 27, 2023
f07228d
revert vendor changes
DylanGuedes Jun 27, 2023
2f00fb7
Undo change to error messages
DylanGuedes Jun 27, 2023
53162c8
Add flag docs
DylanGuedes Jun 27, 2023
99a4c83
remove unnecessary test
DylanGuedes Jun 27, 2023
8eb1308
Use the checksum as part of the key.
DylanGuedes Jun 28, 2023
c3736af
Add changelog entry.
DylanGuedes Jul 2, 2023
8f556ed
Add functional test.
DylanGuedes Jul 2, 2023
dad71f9
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jul 3, 2023
18da973
Change "cache_postings" -> "enable_cache_postings"
DylanGuedes Jul 11, 2023
942d450
Apply Haudi suggestion (see https://github.com/grafana/loki/pull/9621…
DylanGuedes Jul 11, 2023
fe6c834
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jul 11, 2023
a4979c8
update flag used by e2e test
DylanGuedes Jul 11, 2023
3e0c31d
Refactor how the caching struct is passed
DylanGuedes Jul 11, 2023
fd8b411
fix lint.
DylanGuedes Jul 16, 2023
871e3e4
Use background writes for LRU cache.
DylanGuedes Jul 16, 2023
0c4e141
Add length=0 bypass.
DylanGuedes Jul 18, 2023
67852d9
Change default max item size.
DylanGuedes Jul 18, 2023
715273c
Update docs
DylanGuedes Jul 18, 2023
d2fc841
lint
DylanGuedes Jul 18, 2023
468914a
Implements snappy postings decoding/encoding
DylanGuedes Jul 20, 2023
ab45f97
fix formatting
DylanGuedes Jul 20, 2023
61e1bec
Remove LRU cache.
DylanGuedes Jul 26, 2023
ce02398
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jul 26, 2023
2be199b
fix microservices test
DylanGuedes Jul 26, 2023
07926d5
Rename enable-postings-cache flag.
DylanGuedes Jul 27, 2023
6c7433b
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jul 27, 2023
c3e6e8b
Apply suggestions from code review
DylanGuedes Jul 28, 2023
ca33e61
fix test
DylanGuedes Jul 28, 2023
bfc54c4
add description docs for tsdbshipper
DylanGuedes Jul 28, 2023
8ec02b6
Merge branch 'main' of github.com:grafana/loki into check-index-posti…
DylanGuedes Jul 28, 2023
b87c48d
Test caching behaviro on e2e test.
DylanGuedes Aug 1, 2023
ee07d76
update go.mod
DylanGuedes Aug 1, 2023
6177240
change to sorteablelabelmatchers.
DylanGuedes Aug 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

##### Enhancements

* [9621](https://github.com/grafana/loki/pull/9621) **DylanGuedes**: Introduce TSDB postings cache.
* [10010](https://github.com/grafana/loki/pull/10010) **rasta-rocket**: feat(promtail): retrieve BotTags field from cloudflare
* [9995](https://github.com/grafana/loki/pull/9995) **chaudum**: Add jitter to the flush interval to prevent multiple ingesters to flush at the same time.
* [9797](https://github.com/grafana/loki/pull/9797) **chaudum**: Add new `loki_index_gateway_requests_total` counter metric to observe per-tenant RPS
Expand Down
8 changes: 8 additions & 0 deletions docs/sources/configure/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1978,6 +1978,9 @@ boltdb_shipper:
# CLI flag: -boltdb.shipper.build-per-tenant-index
[build_per_tenant_index: <boolean> | default = false]

# Configures storing index in an Object Store
# (GCS/S3/Azure/Swift/COS/Filesystem) in a prometheus TSDB-like format. Required
# fields only required when TSDB is defined in config.
tsdb_shipper:
# Directory where ingesters would write index files which would then be
# uploaded by shipper to configured storage
Expand Down Expand Up @@ -2037,6 +2040,11 @@ tsdb_shipper:
[mode: <string> | default = ""]

[ingesterdbretainperiod: <duration>]

# Experimental. Whether TSDB should cache postings or not. The
# index-read-cache will be used as the backend.
# CLI flag: -tsdb.enable-postings-cache
[enable_postings_cache: <boolean> | default = false]
```

### chunk_store_config
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ require (
golang.org/x/exp v0.0.0-20230321023759-10a507213a29
golang.org/x/oauth2 v0.10.0
golang.org/x/text v0.11.0
google.golang.org/protobuf v1.31.0
)

require (
Expand Down Expand Up @@ -309,7 +310,6 @@ require (
google.golang.org/genproto v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/fsnotify/fsnotify.v1 v1.4.7 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
Expand Down
186 changes: 186 additions & 0 deletions integration/loki_micro_services_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@ package integration

import (
"context"
"strings"
"testing"
"time"

dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"google.golang.org/protobuf/proto"

"github.com/grafana/loki/integration/client"
"github.com/grafana/loki/integration/cluster"
Expand Down Expand Up @@ -567,3 +571,185 @@ func TestSchedulerRing(t *testing.T) {
assert.ElementsMatch(t, []string{"lineA", "lineB", "lineC", "lineD"}, lines)
})
}

func TestQueryTSDB_WithCachedPostings(t *testing.T) {
clu := cluster.New(nil, cluster.SchemaWithTSDB)

defer func() {
assert.NoError(t, clu.Cleanup())
}()

var (
tDistributor = clu.AddComponent(
"distributor",
"-target=distributor",
)
tIndexGateway = clu.AddComponent(
"index-gateway",
"-target=index-gateway",
"-tsdb.enable-postings-cache=true",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test passes with this flag set to false. is there any way to this test to fail when caching is not happening? where are the objects coming from (local filesystem?), could we change the data at rest to prove the data we're getting is from cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an assertion that checks the cache added/misses/gets metrics results. they guarantee we're covering the cache behavior using the FIFO cache.

"-store.index-cache-read.cache.enable-fifocache=true",
)
)
require.NoError(t, clu.Run())

var (
tIngester = clu.AddComponent(
"ingester",
"-target=ingester",
"-ingester.flush-on-shutdown=true",
"-tsdb.shipper.index-gateway-client.server-address="+tIndexGateway.GRPCURL(),
)
tQueryScheduler = clu.AddComponent(
"query-scheduler",
"-target=query-scheduler",
"-query-scheduler.use-scheduler-ring=false",
"-tsdb.shipper.index-gateway-client.server-address="+tIndexGateway.GRPCURL(),
)
tCompactor = clu.AddComponent(
"compactor",
"-target=compactor",
"-boltdb.shipper.compactor.compaction-interval=1s",
"-tsdb.shipper.index-gateway-client.server-address="+tIndexGateway.GRPCURL(),
)
)
require.NoError(t, clu.Run())

// finally, run the query-frontend and querier.
var (
tQueryFrontend = clu.AddComponent(
"query-frontend",
"-target=query-frontend",
"-frontend.scheduler-address="+tQueryScheduler.GRPCURL(),
"-frontend.default-validity=0s",
"-common.compactor-address="+tCompactor.HTTPURL(),
"-tsdb.shipper.index-gateway-client.server-address="+tIndexGateway.GRPCURL(),
)
_ = clu.AddComponent(
"querier",
"-target=querier",
"-querier.scheduler-address="+tQueryScheduler.GRPCURL(),
"-common.compactor-address="+tCompactor.HTTPURL(),
"-tsdb.shipper.index-gateway-client.server-address="+tIndexGateway.GRPCURL(),
)
)
require.NoError(t, clu.Run())

tenantID := randStringRunes()

now := time.Now()
cliDistributor := client.New(tenantID, "", tDistributor.HTTPURL())
cliDistributor.Now = now
cliIngester := client.New(tenantID, "", tIngester.HTTPURL())
cliIngester.Now = now
cliQueryFrontend := client.New(tenantID, "", tQueryFrontend.HTTPURL())
cliQueryFrontend.Now = now
cliIndexGateway := client.New(tenantID, "", tIndexGateway.HTTPURL())
cliIndexGateway.Now = now

// initial cache state.
igwMetrics, err := cliIndexGateway.Metrics()
require.NoError(t, err)
assertCacheState(t, igwMetrics, &expectedCacheState{
cacheName: "store.index-cache-read.embedded-cache",
gets: 0,
misses: 0,
added: 0,
})

t.Run("ingest-logs", func(t *testing.T) {
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", time.Now().Add(-72*time.Hour), map[string]string{"job": "fake"}))
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineB", time.Now().Add(-48*time.Hour), map[string]string{"job": "fake"}))
})

// restart ingester which should flush the chunks and index
require.NoError(t, tIngester.Restart())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also hit /flush here to be even more deliberate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part was just copied from the other tests


// Query lines
t.Run("query to verify logs being served from storage", func(t *testing.T) {
resp, err := cliQueryFrontend.RunRangeQuery(context.Background(), `{job="fake"}`)
require.NoError(t, err)
assert.Equal(t, "streams", resp.Data.ResultType)

var lines []string
for _, stream := range resp.Data.Stream {
for _, val := range stream.Values {
lines = append(lines, val[1])
}
}

assert.ElementsMatch(t, []string{"lineA", "lineB"}, lines)
})

igwMetrics, err = cliIndexGateway.Metrics()
require.NoError(t, err)
assertCacheState(t, igwMetrics, &expectedCacheState{
cacheName: "store.index-cache-read.embedded-cache",
gets: 50,
misses: 1,
added: 1,
})

// ingest logs with ts=now.
require.NoError(t, cliDistributor.PushLogLine("lineC", map[string]string{"job": "fake"}))
require.NoError(t, cliDistributor.PushLogLine("lineD", map[string]string{"job": "fake"}))

// default length is 7 days.
resp, err := cliQueryFrontend.RunRangeQuery(context.Background(), `{job="fake"}`)
require.NoError(t, err)
assert.Equal(t, "streams", resp.Data.ResultType)

var lines []string
for _, stream := range resp.Data.Stream {
for _, val := range stream.Values {
lines = append(lines, val[1])
}
}
// expect lines from both, ingesters memory and from the store.
assert.ElementsMatch(t, []string{"lineA", "lineB", "lineC", "lineD"}, lines)

}

func getValueFromMF(mf *dto.MetricFamily, lbs []*dto.LabelPair) float64 {
for _, m := range mf.Metric {
if !assert.ObjectsAreEqualValues(lbs, m.GetLabel()) {
continue
}

return m.Counter.GetValue()
}

return 0
}

func assertCacheState(t *testing.T, metrics string, e *expectedCacheState) {
var parser expfmt.TextParser
mfs, err := parser.TextToMetricFamilies(strings.NewReader(metrics))
require.NoError(t, err)

lbs := []*dto.LabelPair{
{
Name: proto.String("cache"),
Value: proto.String(e.cacheName),
},
}

mf, found := mfs["querier_cache_added_new_total"]
require.True(t, found)
require.Equal(t, e.added, getValueFromMF(mf, lbs))

mf, found = mfs["querier_cache_gets_total"]
require.True(t, found)
require.Equal(t, e.gets, getValueFromMF(mf, lbs))

mf, found = mfs["querier_cache_misses_total"]
require.True(t, found)
require.Equal(t, e.misses, getValueFromMF(mf, lbs))
}

type expectedCacheState struct {
cacheName string
gets float64
misses float64
added float64
}
7 changes: 4 additions & 3 deletions pkg/storage/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ import (
"github.com/grafana/loki/pkg/storage/stores/series/index"
"github.com/grafana/loki/pkg/storage/stores/shipper"
"github.com/grafana/loki/pkg/storage/stores/shipper/indexgateway"
"github.com/grafana/loki/pkg/storage/stores/tsdb"
util_log "github.com/grafana/loki/pkg/util/log"
)

Expand Down Expand Up @@ -283,9 +284,9 @@ type Config struct {
DisableBroadIndexQueries bool `yaml:"disable_broad_index_queries"`
MaxParallelGetChunk int `yaml:"max_parallel_get_chunk"`

MaxChunkBatchSize int `yaml:"max_chunk_batch_size"`
BoltDBShipperConfig shipper.Config `yaml:"boltdb_shipper" doc:"description=Configures storing index in an Object Store (GCS/S3/Azure/Swift/COS/Filesystem) in the form of boltdb files. Required fields only required when boltdb-shipper is defined in config."`
TSDBShipperConfig indexshipper.Config `yaml:"tsdb_shipper"`
MaxChunkBatchSize int `yaml:"max_chunk_batch_size"`
BoltDBShipperConfig shipper.Config `yaml:"boltdb_shipper" doc:"description=Configures storing index in an Object Store (GCS/S3/Azure/Swift/COS/Filesystem) in the form of boltdb files. Required fields only required when boltdb-shipper is defined in config."`
TSDBShipperConfig tsdb.IndexCfg `yaml:"tsdb_shipper" doc:"description=Configures storing index in an Object Store (GCS/S3/Azure/Swift/COS/Filesystem) in a prometheus TSDB-like format. Required fields only required when TSDB is defined in config."`

// Config for using AsyncStore when using async index stores like `boltdb-shipper`.
// It is required for getting chunk ids of recently flushed chunks from the ingesters.
Expand Down
4 changes: 2 additions & 2 deletions pkg/storage/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ func (s *store) storeForPeriod(p config.PeriodConfig, tableRange config.TableRan
indexClientLogger := log.With(s.logger, "index-store", fmt.Sprintf("%s-%s", p.IndexType, p.From.String()))

if p.IndexType == config.TSDBType {
if shouldUseIndexGatewayClient(s.cfg.TSDBShipperConfig) {
if shouldUseIndexGatewayClient(s.cfg.TSDBShipperConfig.Config) {
// inject the index-gateway client into the index store
gw, err := gatewayclient.NewGatewayClient(s.cfg.TSDBShipperConfig.IndexGatewayClientConfig, indexClientReg, s.limits, indexClientLogger)
if err != nil {
Expand Down Expand Up @@ -272,7 +272,7 @@ func (s *store) storeForPeriod(p config.PeriodConfig, tableRange config.TableRan
}

indexReaderWriter, stopTSDBStoreFunc, err := tsdb.NewStore(fmt.Sprintf("%s_%s", p.ObjectType, p.From.String()), s.cfg.TSDBShipperConfig, s.schemaCfg, f, objectClient, s.limits,
tableRange, backupIndexWriter, indexClientReg, indexClientLogger)
tableRange, backupIndexWriter, indexClientReg, indexClientLogger, s.indexReadCache)
if err != nil {
return nil, nil, nil, err
}
Expand Down
7 changes: 4 additions & 3 deletions pkg/storage/store_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import (
"github.com/grafana/loki/pkg/storage/config"
"github.com/grafana/loki/pkg/storage/stores/indexshipper"
"github.com/grafana/loki/pkg/storage/stores/shipper"
"github.com/grafana/loki/pkg/storage/stores/tsdb"
util_log "github.com/grafana/loki/pkg/util/log"
"github.com/grafana/loki/pkg/util/marshal"
"github.com/grafana/loki/pkg/validation"
Expand Down Expand Up @@ -1005,7 +1006,7 @@ func TestStore_indexPrefixChange(t *testing.T) {

cfg := Config{
FSConfig: local.FSConfig{Directory: path.Join(tempDir, "chunks")},
TSDBShipperConfig: shipperConfig,
TSDBShipperConfig: tsdb.IndexCfg{Config: shipperConfig},
NamedStores: NamedStores{
Filesystem: map[string]NamedFSConfig{
"named-store": {Directory: path.Join(tempDir, "named-store")},
Expand Down Expand Up @@ -1166,7 +1167,7 @@ func TestStore_MultiPeriod(t *testing.T) {
BoltDBShipperConfig: shipper.Config{
Config: shipperConfig,
},
TSDBShipperConfig: shipperConfig,
TSDBShipperConfig: tsdb.IndexCfg{Config: shipperConfig, CachePostings: false},
NamedStores: NamedStores{
Filesystem: map[string]NamedFSConfig{
"named-store": {Directory: path.Join(tempDir, "named-store")},
Expand Down Expand Up @@ -1479,7 +1480,7 @@ func TestStore_BoltdbTsdbSameIndexPrefix(t *testing.T) {
cfg := Config{
FSConfig: local.FSConfig{Directory: path.Join(tempDir, "chunks")},
BoltDBShipperConfig: boltdbShipperConfig,
TSDBShipperConfig: tsdbShipperConfig,
TSDBShipperConfig: tsdb.IndexCfg{Config: tsdbShipperConfig},
}

schemaConfig := config.SchemaConfig{
Expand Down
Loading