add objects list caching for boltdb-shipper index store to reduce object storage list api calls #5160

sandeepsukhani · 2022-01-17T10:49:39Z

What this PR does / why we need it:
We, as of now, do a LIST calls per table when we need to find its objects.
If someone has a lot of tables cached locally or has query readiness set to a large number of days, it results in many list calls because each querier tries to sync tables every 5 mins by default.
This PR reduces the number of LIST calls we make when using hosted object stores(S3, GCS, Azure Blob Storage and Swift) as a shared store for boltdb-shipper.

The idea is to do a flat listing of objects supported by hosted object stores mentioned above and cache it until it goes stale.

Special notes for your reviewer:
Since caching requires a flat listing supported only by hosted object stores, I have added a prefixedObjectClient, making the implementation somewhat cleaner. prefixedObjectClient takes care of adding/removing configured object prefix to the keys. Without prefixedObjectClient, we will have to also make the caching client aware of object prefixes.

Checklist

Tests updated

…ect storage list api calls

chaudum

Great stuff!

chaudum · 2022-01-19T08:46:51Z

pkg/storage/stores/shipper/storage/cached_client.go

+		return nil, nil, fmt.Errorf("invalid prefix %s", prefix)
+	}
+
+	if !c.cacheBuiltAt.Add(cacheTimeout).After(time.Now()) {


Could also be written as:

Suggested change

if !c.cacheBuiltAt.Add(cacheTimeout).After(time.Now()) {

if time.Since(c.cacheBuiltAt) > cacheTimeout {

which I find easier to read.

chaudum · 2022-01-19T08:58:06Z

pkg/storage/stores/shipper/storage/cached_client.go

+		select {
+		case c.rebuildCacheChan <- struct{}{}:
+			c.err = nil
+			c.err = c.buildCache(ctx)
+			<-c.rebuildCacheChan
+			if c.err != nil {
+				level.Error(util_log.Logger).Log("msg", "failed to build cache", "err", c.err)
+			}
+		default:
+			for !c.cacheBuiltAt.Add(cacheTimeout).After(time.Now()) && c.err == nil {
+				time.Sleep(time.Millisecond)
+			}
+		}


I have a hard time understanding why you chose to use a channel here. I assume to block concurrent access on List(). First call is building the cache while all others wait until cache is built?

Yeah, just first or one of the concurrent calls to list should get to build the cache while others wait for it to finish successfully or with error. I will add a comment to make it clearer.

chaudum · 2022-01-19T09:19:17Z

pkg/storage/stores/shipper/storage/cached_client.go

+	c.tablesMtx.Lock()
+	defer c.tablesMtx.Unlock()
+
+	c.tables = map[string]*table{}


Could we decrease the lock time by assigning c.tables at the very end?

Suggested change

c.tablesMtx.Lock()

defer c.tablesMtx.Unlock()

c.tables = map[string]*table{}

new_tables := map[string]*table{}

...

c.tablesMtx.Lock()

defer c.tablesMtx.Unlock()

c.tables = new_tables

c.cacheBuiltAt = time.Now()

return nil

We want to keep it locked until we build the cache to avoid returning stale results. Most of these list calls happen async so I am refreshing the cache on demand instead of running a goroutine refreshing it every min since we usually do these operations every 5 mins in index-gateway and 10 mins in compactor by default.

chaudum · 2022-01-19T09:23:47Z

pkg/storage/stores/shipper/storage/cached_client_test.go

+	objects, commonPrefixes, err := cachedObjectClient.List(context.Background(), "", "")
+	require.NoError(t, err)
+	require.Equal(t, 1, objectClient.listCallsCount)
+	require.Equal(t, objects, []chunk.StorageObject{})


Arguments of the Equal function are in "incorrect" order:

Suggested change

require.Equal(t, objects, []chunk.StorageObject{})

require.Equal(t, []chunk.StorageObject{}, objects)

The function interface is

func Equal(t TestingT, expected interface{}, actual interface{}, msgAndArgs ...interface{})

This isn't a problem as long as expected and actual are equal, but the test error message is misleading in case they aren't.

Guess this is not only a problem in your test, but we have that all over the place.

yeah, sorry I messed up the order. Fixed it.

cyriltovena · 2022-01-19T15:46:09Z

pkg/storage/stores/shipper/storage/cached_client.go

+			}
+		default:
+			for time.Since(c.cacheBuiltAt) >= cacheTimeout && c.err == nil {
+				time.Sleep(time.Millisecond)


For loop and time.Sleep is a no no !

You want to use the promise pattern instead. Not sure if we can avoid a lock/ RW lock.

I added a sync.WaitGroup to make all the goroutines attempting to build the cache to wait until the operation gets over. Can you please check now whether it looks good?

yep looks good.

pkg/storage/stores/shipper/storage/cached_client.go

pkg/storage/stores/shipper/storage/prefixed_object_client.go

cyriltovena

LGTM

slim-bean · 2022-01-23T17:47:52Z

Fixes #5018

sandeepsukhani requested a review from cyriltovena January 17, 2022 10:49

sandeepsukhani requested a review from a team as a code owner January 17, 2022 10:49

pull-request-size bot added the size/L label Jan 17, 2022

sandeepsukhani force-pushed the boltdb-shipper-list-caching branch from 1d20707 to 5bd0127 Compare January 17, 2022 11:02

add objects list caching for boltdb-shipper index store to reduce obj…

9d2a103

…ect storage list api calls

sandeepsukhani force-pushed the boltdb-shipper-list-caching branch from 5bd0127 to 9d2a103 Compare January 19, 2022 06:43

chaudum reviewed Jan 19, 2022

View reviewed changes

sandeepsukhani added 2 commits January 19, 2022 15:30

fix tests and some broken code

6047f0c

changes suggested from PR review

b7dec47

sandeepsukhani requested a review from chaudum January 19, 2022 10:06

cyriltovena reviewed Jan 19, 2022

View reviewed changes

pkg/storage/stores/shipper/storage/cached_client.go Outdated Show resolved Hide resolved

cyriltovena reviewed Jan 19, 2022

View reviewed changes

pkg/storage/stores/shipper/storage/cached_client.go Show resolved Hide resolved

cyriltovena reviewed Jan 19, 2022

View reviewed changes

pkg/storage/stores/shipper/storage/cached_client.go Show resolved Hide resolved

cyriltovena reviewed Jan 19, 2022

View reviewed changes

pkg/storage/stores/shipper/storage/prefixed_object_client.go Show resolved Hide resolved

sandeepsukhani added 2 commits January 20, 2022 11:45

changes suggested from PR review

abbf353

lint

7780ae8

cyriltovena approved these changes Jan 20, 2022

View reviewed changes

sandeepsukhani merged commit 49ffe52 into grafana:main Jan 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add objects list caching for boltdb-shipper index store to reduce object storage list api calls #5160

add objects list caching for boltdb-shipper index store to reduce object storage list api calls #5160

sandeepsukhani commented Jan 17, 2022

chaudum left a comment

chaudum Jan 19, 2022

chaudum Jan 19, 2022

sandeepsukhani Jan 19, 2022 •

edited

Loading

chaudum Jan 19, 2022

sandeepsukhani Jan 19, 2022

chaudum Jan 19, 2022

chaudum Jan 19, 2022

sandeepsukhani Jan 19, 2022

cyriltovena Jan 19, 2022 •

edited

Loading

sandeepsukhani Jan 20, 2022

cyriltovena Jan 20, 2022

cyriltovena left a comment

slim-bean commented Jan 23, 2022

	if !c.cacheBuiltAt.Add(cacheTimeout).After(time.Now()) {
	if time.Since(c.cacheBuiltAt) > cacheTimeout {

	require.Equal(t, objects, []chunk.StorageObject{})
	require.Equal(t, []chunk.StorageObject{}, objects)

add objects list caching for boltdb-shipper index store to reduce object storage list api calls #5160

add objects list caching for boltdb-shipper index store to reduce object storage list api calls #5160

Conversation

sandeepsukhani commented Jan 17, 2022

chaudum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeepsukhani Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

slim-bean commented Jan 23, 2022

sandeepsukhani Jan 19, 2022 •

edited

Loading

cyriltovena Jan 19, 2022 •

edited

Loading