-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache persistence to disk #778
base: main
Are you sure you want to change the base?
Conversation
If you change the base here to |
2ecea0f
to
c0c8ee5
Compare
c6ec0fb
to
6b7b355
Compare
fdefd65
to
d838d8a
Compare
a237166
to
044b481
Compare
@@ -45,6 +45,8 @@ type Store[T any] interface { | |||
} | |||
|
|||
// Expirable is an interface for a cache store that supports expiration. | |||
// It extends the Store interface. | |||
// It also provides disk persistence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what is the association between expirable cache and persistence. The core cache should also be able to have persistence.
The LRU cache may also have persistence.
If we don't want the Store interface to have persistence, a new interface extending the Store interface, say Persistable
, can be made. The default cache can implement both Expirable
and Persistable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with a persistable
interface. Now, LRU does not need persistence in my opinion because items can be evicted at every moment, there is no way to know what is being persisted. I don't see any value there. Also this is a cache so in most cases we don't need to persist it. For the expiration cache, users can configure the retention period, so there can be use cases where we want to start with a snapshot. That's why I finally decided to add the persist
method in the Expirable
interface.
But I guess there might be other implementations in the future that may need it. I'll reintroduce persistable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure how LRU and expirable cache are different in terms of persistence. An application would benefits in the same way on restart if any reusable cache data is available for the application, regardless of the cache item storage policy. The application may continue to use the same few items which are present in the cache and can avoid fetching them again.
@@ -53,6 +55,8 @@ type Expirable[T any] interface { | |||
GetExpiration(object T) (time.Time, error) | |||
// HasExpired returns true if the object has expired. | |||
HasExpired(object T) (bool, error) | |||
// Persist persists the cache to disk. | |||
Persist() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the idea of seeding or restoring data from a snapshot, it may be better to accept an argument here to allow persistence at a different location. Allowing to have separate restore and save locations. A cache may have a need to take snapshots multiple times with different file names.
For our default cache implementation, both the restore and persist path would be the same, but it would be better to leave space for different flexible implementations where the cache may not have internal knowledge about where to save.
I'm considering all these because we are defining an interface. If didn't had an interface, I wouldn't consider about other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. We want a simple backup/restore mechanism. Users don't have access to the underlying datastore, so they cannot decide on what is being snapshot at a given location.
In the future if we want to enable this, we can add a setter method to change the snapshot location. But I don't see a use case now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, maybe we can simplify it further. If there's no configurability for the user of the interface, an explicit method and interface for invoking persist seems unnecessary. The cache would itself determine what, where and when to persist. For example, when the application is about to stop, the cache will be closed. Persistence can be performed as part of closing the cache. In case of controllers apps, it'll most likely be outside of the controller, in the program main. The main would be aware of the type of cache and can call the public Close()
method or the appropriate method for closing the used cache. All the consumers of the cache, the controller/reconcilers, would interact with the cache via the Store
and Expirable
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also one of the reason why Persist()
was in Expirable
. You could just decide whether to store or not what was in the cache (based on the ttl and capacity, there still is a possibility to infere what's in there). Having the cache arbitrarily decide to make a snapshot, means we have to prepare for it (writable fs and capacity), and also that leave no choice. In our case we want to snapshot in IRC
but not in SC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be configured in the cache Options
to persist at close or not.
Or based on the Options.snapshotPath
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to implement this could be to accept persistence policy as a cache option. Policies could be persist at close, or persist at every write, or persist at every x interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes it simpler yes, but you want be able to do p, ok := cache.(Persistable)
to check what type you have in case you receive a cache as a parameter.
Policies could be persist at close, or persist at every write, or persist at every x interval.
So we would have a single policy to begin. This and the interface are not mutually exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an example of how persistence will be used in the controller, it'll be easier to decide if we really need an interface for it.
The way I'm imagining how the controllers will use persistence, I don't see a need for it. As described above, I see the main program to be initializing a specific type of cache and passing that to the reconcilers. Since the main program knows what type of cache it is using and its features, it can control when to persist or configure the cache to persist based on some policy. Persistence becomes an internal feature of the cache. Other components don't need to do anything about persistence.
I don't know of a use case where the reconciler or something else that's unaware of the cache type will need to deal with persistence.
If implemented, the cache will be persisted to disk by calling `Persist`. It will be loaded when instantiating a cache by calling `New` if an existing `path` is provided. Signed-off-by: Soule BA <bah.soule@gmail.com>
// writeToBuf writes the cache to the buffer | ||
// no locks are taken, the caller should ensure that | ||
// the cache is not being modified while this function is called. | ||
func (c *cache[T]) writeToBuf() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all the cached data are of known type and in Go, can't we just use gob to achieve the same, that is to serialize the cache items and write to a buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have better control. I am open to change this. My reasoning is that if we ever want to offer different durability level for the cache (mem vs on disk), we will need to deal with the underlying persistence layer on an per-item basis in order to be able to every change. We may never need it, but it's possible with this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gob API deals with serialization at per object level. Looks similar to what the current code does, just the serialization of each cache item comes for free. How the serialized data is organized later and loaded is up to the higher level implementation. Gob just provides a nice encoder and decoder for simple serialization of Go objects.
follow-up to #766
This PR adds a persistence feature to the cache pkg.
The principal use cases is for
image-reflector-automation
to cacherepository tags
and to be able to persist them when the controller stop.