Allow limiting by memory limit instead of item count #11

webmaster128 · 2020-11-01T20:43:39Z

Same issue as described here: jeromefroe/lru-rs#32. Would you be interested in implementing the solution proposed here?

If such a change is desired, that means the total number of elements is not known anymore. So this would require some changes and probably tradeoffs. Is this something you would like to see happening in this crate?

marmeladema · 2020-11-01T23:19:54Z

I am definitely not against it, but as always, it depends on the trade-offs.

I like the API you proposed with put_with_cost, other names could be put_with_weight or weighted_put.
Before jumping to implementation, I think it would be worthwhile to describe the semantic of common operations like get, put, put_with_cost with this new model.

webmaster128 · 2020-11-01T23:25:18Z

Sounds good, thanks. I can write a few doc domments to the API functions in the next days to explain the user's view of things.

I like the renaming cost -> weight as it described a little better that this is carried around for the lifetime of the cache entry, not a one-time thing.

webmaster128 · 2020-11-02T22:46:46Z

Looking into this a bit more, I think there are two core changes that the implementation would need:

Use a HashMap with no pre-set capacity since we don't know the number of elements in advance. I think we can assume HashMap increases its size in a clever way and at some point it does not grow anymore anyways.
Use a doubly-linked list of variable size that has a constant time remove (and get?) function (for moving to the front). Ideally we can use LinkedList at some point, where constant time remove is discussed right now. But it should be doable in the custom implementation somehow.

marmeladema · 2020-11-03T09:27:16Z

Use a HashMap with no pre-set capacity since we don't know the number of elements in advance. I think we can assume HashMap increases its size in a clever way and at some point it does not grow anymore anyways.

I don't think it's really related to the current work. Moreover, capacity still remains an upper bound number of elements.

Use a doubly-linked list of variable size that has a constant time remove (and get?) function (for moving to the front). Ideally we can use LinkedList at some point, where constant time remove is discussed right now. But it should be doable in the custom implementation somehow.

The current linked list implementation is already doubly-linked and supports constant time remove. I don't really see a value in using LinkedList from stdlib. It lacks some useful APIs and some are marked as deprecated.

Having said that, I think adding support for put_with_weight is not actually that much of a change.
I will PoC something later tonight with the following constraints:

no change in FixedSizeList, capacity still being an upper bound number of elements, that part won't change.
store weight alongside key and value in the linked list nodes. Probably introducing a struct CLruCacheNode to holds those fields instead of relying on a tuple.

One thing that we would have to be careful of is the fact, as you noted in the PR, that put_with_weight won't be constant for a weight greater than 1. It will have to pop_back() until the weight of the cache + the additional node is <= capacity. I think it's OK, as long as put itself remains O(1).

The last thing I am unsure is the type of the weight argument?

Should it be a NonZeroUsize? But it might complicate the API too much?
Or a plain usize? But then what is the semantic of a put_with_weight(key, value, 0) call?

webmaster128 · 2020-11-03T10:24:07Z

After thinking about this a little more, I also don't think the STL's LinkedList will add the index table that we have here and that is required. The only property of it we might want to have is variable length.

I think the core difference between my original proposal and your proposal is the limits. There are two options:

One dimensional: there is a single limit, and elements have variable weight. We don't know the number of elements at all since the application can use extremely large weights for very little number of elements.
Two-dimensional: there is a limit for the number of elements and a limit for the weight. This allows us to make assumptions on the expected number of elements.

I think both ways can work. The first one feels more elegant to me, but the second one probably makes some of the implementation more performant. Some of your open questions can probably only answered after choosing one route or the other.

It will have to pop_back() until the weight of the cache + the additional node is <= capacity. I think it's OK, as long as put itself remains O(1).

Since you have the worst case scenario where the cache is filled with N elements of weight 1 and you need to remove all of them in order to put in an element of weight N, I don't think O(1) is achievable.

marmeladema · 2020-11-03T10:32:23Z

Since you have the worst case scenario where the cache is filled with N elements of weight 1 and you need to remove all of them in order to put in an element of weight N, I don't think O(1) is achievable.

Since put will always use a weight of 1, it you will never have to pop more then 1 element, just as it's done right now; so it should work. Providing we don't allow weight of 0

webmaster128 · 2020-11-03T10:40:09Z

Nice catch, yeah, put and put_with_weight have different worst case characteristics then.

This makes weight > 0 an important property and I'd rather use NonZeroUsize than a panicking implementation or adding additional Result wrappers.

marmeladema · 2020-11-03T15:02:30Z

or adding additional Result wrappers.

I kind of think we will have to do this nonetheless. There is the case of someone calling put_with_weight() with a weigh bigger than the actual capacity of the cache. In that case, even clearing the cache won't make room for enough space.
It can already happen currently but only when the capacity is 0 which is a very unlikely scenario and thus in practice we ignore put() call if the capacity is 0 instead of returning an error.
But with put_with_weight is becomes very likely that this situation will arise and that people will need to know their insert failed. Ignoring the call all-together does not seem the right answer anymore.

webmaster128 · 2020-11-03T15:12:17Z

Right.

I think it is reasonable to skip the put if it does not fit in just as the zero capacity case. It like filling water in a bucket. You can do that as much as you want but it can happen that the amount if water inside does not increase by doing so. Configuring the cache in a way that it makes sense for the use case and produces cache hits needs to be done by the application anyways.

I don't have a strong opinion here. Both APIs seem solid to me.

marmeladema · 2021-03-01T23:10:24Z

Done in #32

webmaster128 mentioned this issue Nov 2, 2020

Create draft API for element weights #19

Closed

marmeladema mentioned this issue Nov 9, 2020

Implement weighted lru cache semantic #24

Closed

marmeladema closed this as completed Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow limiting by memory limit instead of item count #11

Allow limiting by memory limit instead of item count #11

webmaster128 commented Nov 1, 2020 •

edited

Loading

marmeladema commented Nov 1, 2020

webmaster128 commented Nov 1, 2020

webmaster128 commented Nov 2, 2020 •

edited

Loading

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020 •

edited

Loading

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020

marmeladema commented Mar 1, 2021

Allow limiting by memory limit instead of item count #11

Allow limiting by memory limit instead of item count #11

Comments

webmaster128 commented Nov 1, 2020 • edited Loading

marmeladema commented Nov 1, 2020

webmaster128 commented Nov 1, 2020

webmaster128 commented Nov 2, 2020 • edited Loading

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020 • edited Loading

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020

marmeladema commented Nov 3, 2020

webmaster128 commented Nov 3, 2020

marmeladema commented Mar 1, 2021

webmaster128 commented Nov 1, 2020 •

edited

Loading

webmaster128 commented Nov 2, 2020 •

edited

Loading

webmaster128 commented Nov 3, 2020 •

edited

Loading