Cache persistence to disk #778

souleb · 2024-05-30T13:17:52Z

follow-up to #766

This PR adds a persistence feature to the cache pkg.

The principal use cases is for image-reflector-automation to cache repository tags and to be able to persist them when the controller stop.

makkes · 2024-05-30T18:01:45Z

If you change the base here to persist-cache, then it's possible to see only the difference to #766.

cache/cache.go

darkowlzz · 2024-06-17T14:04:52Z

cache/store.go

@@ -45,6 +45,8 @@ type Store[T any] interface {
 }

 // Expirable is an interface for a cache store that supports expiration.
+// It extends the Store interface.
+// It also provides disk persistence.


It's not clear to me what is the association between expirable cache and persistence. The core cache should also be able to have persistence.
The LRU cache may also have persistence.

If we don't want the Store interface to have persistence, a new interface extending the Store interface, say Persistable, can be made. The default cache can implement both Expirable and Persistable.

I started with a persistable interface. Now, LRU does not need persistence in my opinion because items can be evicted at every moment, there is no way to know what is being persisted. I don't see any value there. Also this is a cache so in most cases we don't need to persist it. For the expiration cache, users can configure the retention period, so there can be use cases where we want to start with a snapshot. That's why I finally decided to add the persist method in the Expirable interface.

But I guess there might be other implementations in the future that may need it. I'll reintroduce persistable.

I'm still not sure how LRU and expirable cache are different in terms of persistence. An application would benefits in the same way on restart if any reusable cache data is available for the application, regardless of the cache item storage policy. The application may continue to use the same few items which are present in the cache and can avoid fetching them again.

darkowlzz · 2024-06-17T14:17:17Z

cache/store.go

@@ -53,6 +55,8 @@ type Expirable[T any] interface {
 	GetExpiration(object T) (time.Time, error)
 	// HasExpired returns true if the object has expired.
 	HasExpired(object T) (bool, error)
+	// Persist persists the cache to disk.
+	Persist() error


Considering the idea of seeding or restoring data from a snapshot, it may be better to accept an argument here to allow persistence at a different location. Allowing to have separate restore and save locations. A cache may have a need to take snapshots multiple times with different file names.

For our default cache implementation, both the restore and persist path would be the same, but it would be better to leave space for different flexible implementations where the cache may not have internal knowledge about where to save.
I'm considering all these because we are defining an interface. If didn't had an interface, I wouldn't consider about other implementations.

I'm not sure about this. We want a simple backup/restore mechanism. Users don't have access to the underlying datastore, so they cannot decide on what is being snapshot at a given location.

In the future if we want to enable this, we can add a setter method to change the snapshot location. But I don't see a use case now.

In that case, maybe we can simplify it further. If there's no configurability for the user of the interface, an explicit method and interface for invoking persist seems unnecessary. The cache would itself determine what, where and when to persist. For example, when the application is about to stop, the cache will be closed. Persistence can be performed as part of closing the cache. In case of controllers apps, it'll most likely be outside of the controller, in the program main. The main would be aware of the type of cache and can call the public Close() method or the appropriate method for closing the used cache. All the consumers of the cache, the controller/reconcilers, would interact with the cache via the Store and Expirable interface.

That's also one of the reason why Persist() was in Expirable. You could just decide whether to store or not what was in the cache (based on the ttl and capacity, there still is a possibility to infere what's in there). Having the cache arbitrarily decide to make a snapshot, means we have to prepare for it (writable fs and capacity), and also that leave no choice. In our case we want to snapshot in IRC but not in SC.

That can be configured in the cache Options to persist at close or not.
Or based on the Options.snapshotPath value.

Another way to implement this could be to accept persistence policy as a cache option. Policies could be persist at close, or persist at every write, or persist at every x interval.

That makes it simpler yes, but you want be able to do p, ok := cache.(Persistable) to check what type you have in case you receive a cache as a parameter.

Policies could be persist at close, or persist at every write, or persist at every x interval.
So we would have a single policy to begin. This and the interface are not mutually exclusive.

If we have an example of how persistence will be used in the controller, it'll be easier to decide if we really need an interface for it.
The way I'm imagining how the controllers will use persistence, I don't see a need for it. As described above, I see the main program to be initializing a specific type of cache and passing that to the reconcilers. Since the main program knows what type of cache it is using and its features, it can control when to persist or configure the cache to persist based on some policy. Persistence becomes an internal feature of the cache. Other components don't need to do anything about persistence.
I don't know of a use case where the reconciler or something else that's unaware of the cache type will need to deal with persistence.

If implemented, the cache will be persisted to disk by calling `Persist`. It will be loaded when instantiating a cache by calling `New` if an existing `path` is provided. Signed-off-by: Soule BA <bah.soule@gmail.com>

darkowlzz · 2024-06-18T15:14:03Z

cache/cache.go

+// writeToBuf writes the cache to the buffer
+// no locks are taken, the caller should ensure that
+// the cache is not being modified while this function is called.
+func (c *cache[T]) writeToBuf() error {


Since all the cached data are of known type and in Go, can't we just use gob to achieve the same, that is to serialize the cache items and write to a buffer?

To have better control. I am open to change this. My reasoning is that if we ever want to offer different durability level for the cache (mem vs on disk), we will need to deal with the underlying persistence layer on an per-item basis in order to be able to every change. We may never need it, but it's possible with this implementation.

The gob API deals with serialization at per object level. Looks similar to what the current code does, just the serialization of each cache item comes for free. How the serialized data is organized later and loaded is up to the higher level implementation. Gob just provides a nice encoder and decoder for simple serialization of Go objects.

souleb requested a review from a team as a code owner May 30, 2024 13:17

souleb changed the base branch from main to enable-cachin-auth-tokens May 30, 2024 20:42

souleb force-pushed the persist-cache branch 3 times, most recently from 2ecea0f to c0c8ee5 Compare May 31, 2024 09:24

souleb force-pushed the enable-cachin-auth-tokens branch 2 times, most recently from c6ec0fb to 6b7b355 Compare June 10, 2024 13:08

souleb force-pushed the enable-cachin-auth-tokens branch 10 times, most recently from fdefd65 to d838d8a Compare June 14, 2024 14:20

Base automatically changed from enable-cachin-auth-tokens to main June 14, 2024 14:36

souleb force-pushed the persist-cache branch 2 times, most recently from a237166 to 044b481 Compare June 17, 2024 12:44

souleb requested review from darkowlzz and stefanprodan June 17, 2024 12:44

stefanprodan reviewed Jun 17, 2024

View reviewed changes

cache/cache.go Outdated Show resolved Hide resolved

souleb force-pushed the persist-cache branch from 044b481 to da1e921 Compare June 17, 2024 12:56

darkowlzz reviewed Jun 17, 2024

View reviewed changes

Enable persiting cache to disk

dd7d91d

If implemented, the cache will be persisted to disk by calling `Persist`. It will be loaded when instantiating a cache by calling `New` if an existing `path` is provided. Signed-off-by: Soule BA <bah.soule@gmail.com>

souleb force-pushed the persist-cache branch from da1e921 to dd7d91d Compare June 18, 2024 09:28

darkowlzz reviewed Jun 18, 2024

View reviewed changes

darkowlzz mentioned this pull request Dec 3, 2024

Cache use discussion fluxcd/flux2#5094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache persistence to disk #778

Cache persistence to disk #778

souleb commented May 30, 2024 •

edited

Loading

makkes commented May 30, 2024

darkowlzz Jun 17, 2024

souleb Jun 17, 2024

darkowlzz Jun 18, 2024

darkowlzz Jun 17, 2024 •

edited

Loading

souleb Jun 17, 2024

darkowlzz Jun 18, 2024

souleb Jun 18, 2024

darkowlzz Jun 18, 2024 •

edited

Loading

darkowlzz Jun 18, 2024 •

edited

Loading

souleb Jun 18, 2024

darkowlzz Jun 18, 2024

darkowlzz Jun 18, 2024 •

edited

Loading

souleb Jun 18, 2024

darkowlzz Jun 18, 2024 •

edited

Loading

Cache persistence to disk #778

Are you sure you want to change the base?

Cache persistence to disk #778

Conversation

souleb commented May 30, 2024 • edited Loading

makkes commented May 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

darkowlzz Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darkowlzz Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

souleb commented May 30, 2024 •

edited

Loading

darkowlzz Jun 17, 2024 •

edited

Loading

darkowlzz Jun 18, 2024 •

edited

Loading

darkowlzz Jun 18, 2024 •

edited

Loading

darkowlzz Jun 18, 2024 •

edited

Loading

darkowlzz Jun 18, 2024 •

edited

Loading