Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added TTL #104

Closed
wants to merge 10 commits into from
Closed

added TTL #104

wants to merge 10 commits into from

Conversation

karlmcguire
Copy link
Contributor

@karlmcguire karlmcguire commented Nov 15, 2019

This is a simple solution for #43.

The only issue I can foresee is some of the random samples not picking up items with set TTLs and rather than evicting expired items we'd evict "indefinite" items within the sample. So, this TTL solution is "sampled TTL" and not true TTL.


This change is Reviewable

policy.go Outdated Show resolved Hide resolved
policy.go Outdated Show resolved Hide resolved
policy.go Outdated Show resolved Hide resolved
//
// TTL is the amount of time (in seconds) that the item will remain in the
// cache. To make an item stay indefinitely, set TTL to -1.
func (c *Cache) Set(key, value interface{}, cost, ttl int64) bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the moment to start tagging releases. Because you haven't tagged a release before everyone is using master and everyone's build will break at soon as this is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@ccampbell
Copy link

ccampbell commented Dec 13, 2019

Just randomly stumbled upon this PR. I am obviously not super familiar with this code so take these comments with a grain of salt, but I am curious why you chose to implement TTL the way you did vs. doing something like this (pseudo code):

func set(key, value, ttl) {
    exp = time.Now() + ttl
    store(key, wrapped{value, exp})
}

func get(key) value {
    wrapped = retrieve(key)
    if time.Now() > wrapped.exp {
        forceEvict(key)
        return nil
    }

    return wrapped.value
}

Obviously that is oversimplified, but this would guarantee that expired data can never be returned.

I could be wrong but in your implementation it looks like expired items are only purged when a new cache.Set() call occurs via policy.Add(). So if you have data that was stored with a ttl and has expired, and you never add new items to cache, then the get call for that content will return stale data until you add something new to your cache. Not saying that is likely, but it could happen.


Another advantage of storing the expiration along with the value is it could be used to automatically prevent cache stampeding or could be queried on its own. For example, you could have a separate cache.GetStale() method that could do something like

  1. If the content is not in cache or is in cache and not expired return the item as usual
  2. If the content is in cache, but expired return nil and immediately set the stale value back to cache with a future expiration time (one minute in the future, for example).

This would allow the first request that hits an expired item to go back to disk/database/wherever to repopulate the item with a fresh value while ensuring that all other requests to the content in the meantime get the stale value from cache preventing a big spike in traffic to regenerate the cache if a bunch of requests all are made for it at the same time.

The stampeding issue probably wouldn’t exist with your LFU algorithm since those items will never get evicted, but with TTL based cache it could.

Anyway, just some thoughts. I know this library is very performance focused, so I imagine there may be some performance considerations I am missing here.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach would be to have a map of entries, where the key is a time rounded to certain granularity (say T = a second or 5s or something). Then, a goroutine can wake up every T seconds, find the right bucket by doing a lookup, evict all those keys and go back to sleep.

The issues to look out for would be if we have missed a bucket that should have been evicted. That can be tackled by just checking the TTL during a Get or something, which we should be doing anyway.

Reviewable status: 0 of 6 files reviewed, 8 unresolved discussions (waiting on @erikdubbelboer, @jarifibrahim, @karlmcguire, and @manishrjain)


cache.go, line 191 at r1 (raw file):

// TTL is the amount of time (in seconds) that the item will remain in the
// cache. To make an item stay indefinitely, set TTL to -1.
func (c *Cache) Set(key, value interface{}, cost, ttl int64) bool {

SetWithTTL (ttl can be a time.Duration)

Create a new one called Set, which internally calls SetwithTTL


policy.go, line 152 at r3 (raw file):

	victims := make([]*item, 0)
	// delete expired items before doing any evictions, we may not even need to
	for e := p.times.Back(); e != nil; {

This could potentially happen in a goroutine called every second or so, instead of on every add.


policy.go, line 154 at r3 (raw file):

	for e := p.times.Back(); e != nil; {
		i := e.Value.([2]uint64)
		if i[1] > uint64(time.Now().Unix()) {

It seems like you're assuming that p.times is sorted. I wonder where that sort is happening.


policy.go, line 210 at r3 (raw file):

	p.evict.add(key, cost)
	if ttl != -1 {
		p.times.PushFront([2]uint64{key, uint64(ttl)})

ttls can be different times. So, these won't be sorted.

@BradErz
Copy link

BradErz commented Dec 19, 2019

@manishrjain Random stranger here 👋 hope you don't mind me butting in.

What about implementing the ttl as a variadic function?

func (c *Cache) Set(key, value interface{}, cost, opts ...SetOption) bool {
....
}

And the option would be set something like:

cache.Set("key", "value", 0, cache.WithTTL(ttl time.Duration))

To be honest i would also be tempted to do it for the cost parameter as well? That way only the required parameters are actually required but the optional ones can just assume defaults or be overridden by the client.

I'm not familiar with the dgraph code base so not sure if you are already using the pattern somewhere else?

@ben-manes
Copy link

@manishrjain fyi - the evolution of that map of buckets idea, called a hierarchical timing wheel, is implemented in Caffeine.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ben-manes, I think the hierarchical timing wheel concept is complex for what it does. Just Rounding off time into buckets, and having those as keys in map, which a goroutine can periodically wake up and evict -- that's a simpler data structure and process.

Reviewable status: 0 of 6 files reviewed, 8 unresolved discussions (waiting on @erikdubbelboer, @jarifibrahim, and @karlmcguire)

@martinmr
Copy link
Contributor

Closing in favor of #122

@martinmr martinmr closed this Jan 21, 2020
@martinmr martinmr deleted the karl/ttl branch January 21, 2020 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants