Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan: add more "caching" abstractions #1889

Open
fingolfin opened this issue Nov 10, 2017 · 5 comments
Open

Plan: add more "caching" abstractions #1889

fingolfin opened this issue Nov 10, 2017 · 5 comments
Assignees
Labels
kind: discussion discussions, questions, requests for comments, and so on kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: HPC-GAP unification

Comments

@fingolfin
Copy link
Member

fingolfin commented Nov 10, 2017

Motivation

We recently added lib/cache.gi and lib/cache.gd, currently with only a single function MemoizePosIntFunction in them.

Our plan is to add more functionality in there for common caching needs. One motivation is that this will help us to further reduce the number of files in hpcgap/lib/, as quite a few of them are in there simply because they are adding locking code to caches. If we replace the caching code in them by calls to generic caching solutions, we just have to provide HPC-GAP variants of these, but can keep the code using the caches uniform between GAP and HPC-GAP.

Another motivation is that writing an efficient cache can be tricky, and by providing some generally useful abstractions, we will encourage more people to use robust caching solutions in there code.

(I should mention that @sebasguts also worked on caches of a certain kind for the CAP project, and there is certainly some overlap between his work and what I describe here! So perhaps we can share code or collaborate in other ways, or at least benefit from his experiences).

In this issue, I want to make a list of types of caches I think we'll need, with different needs. To do so, I analyzed the diffs between lib/ and hpcgap/lib/, which I think is quite representative. However, since I limited myself to these files, it is likely that there are additional places that use caching logic in the library which are not covered below.

Weak caches vs. flushable caches

The first differentiation criterion for caches is what purpose they serve:

  1. Some caches are needed to ensure multiple calls to a function with equal resp. identical inputs will result in identical outputs. Typical use cases are functions which compute types or families.
  2. Other caches are meant to improve performance, by remembering the result of an expensive computation. GAP attributes and properties are examples of that, but also e.g. the Bernoulli function

In my experience, these two use cases seem to be fully disjoint. (QUESTION: Are they? Can anybody think of a counter example? Resp. a third category of caches not covered by these two?)
For the former, the cache contents are often (always?) cheap to recompute, and so it makes sense to keep the cached values into weak pointer lists, so that GAP can garbage collect them if they are not needed anymore. Summed up, the main feature of this type of cache is that it ensures uniqueness and may keep weak pointers.

For the latter kind of cache, one does not want to let go of the computed result, so weak pointers are usually (always?) not an option. Instead, one may use InstallFlushableValue to give the user an exit hatch if they need to recover memory (albeit a rather coarse one: either flushing everything, or nothing). Note that it's not an option to flush the values of the first type of cache. To summarize, these caches about performance, they trade memory for speed, and are flushable

I propose that we come up with unique names for these two kinds of caches, as they have quite different needs and characteristic. E.g. "weak cache" vs "flushable cache", although that doesn't make the strict distinction that clear. In lack of better terminology, I'll use this below, at least for now.

Our sole existing cache implementation, MemoizePosIntFunction, tries to cater for both, by optionally providing flushing of the cache values. However, it has no option to store its contents into weak pointer lists. It also only supports small (!) positive integers as keys, which brings me to our next point.

Cache implementations: unordered lists, sorted lists, maps, others...

Another important criterion to distinguish types of caches is what types of keys they support, and how lookup of these keys are performed. I'll list some types of keys, with examples where these turn up in the GAP library, and how we currently handle them, and how we might handle them differently.

small positive integers

This case is covered by MemoizePosIntFunction. The idea is that a map from positive integers to objects is a GAP list (array in other languages) in disguise. Hence we implement the cache as a list (actually, an atomic list), which makes for super fast lookup. Drawback: the keys are storage is dense, so size of that list is proportional to the largest input passed to the memoized function. This sharply limits the purposes this can be used for. It would be important to add a sparse variant of MemoizePosIntFunction, e.g. using (atomic) records, see below.

Used in:

  • Bernoulli in combinat.gi: OK to use here, as to compute Bernoulli(n), we need Bernoulli (1) till Bernoulli(n-1), so we can't really do better (dense flushable cache)
  • TYPE_FFE and TYPE_FFE0 in ffe.g: maps small prime integers (<= 2^16) to types. OK since the input range is small. Of course it's also wasteful: The cache can have length up to 65521, even though there are only 6542 primes in that range. So dense storage is OK, but spare might be better (speed vs. memory tradeoff). (potentially weak cache)
  • CYCLOTOMIC_FIELDS in fldabnum.gd: used to cache the n-th cyclotomic field. This is actually a bad fit, IMHO, but we always cached these in a list, which is why it's OK that we use MemoizePosIntFunction for now; but it would be better if the entries were stored sparse. (potentially weak cache)
  • ShiftedCatalan in mgmfree.gi: only for performance. Probably OK, as already ShiftedCatalan(10000) has 6014 digits. Still, might be better with a sparse key set (flushable cache)
  • CyclotomicPol in upoly.gi: caches the coefficients of n-th cyclotomic polynomial. Similar to CYCLOTOMIC_FIELDS, it probably would be better off with a sparse key set (flushable cache)

Probably usable in:

  • FAMS_FFE_LARGE in ffe.gi: keys are primes, this should be a sparse weak cache
  • fam!.ZCache[d] in ffeconway.gi: I think this could be a weak cache. Probably OK to keep dense because the field in question has size p^d, so d is quite limited.
  • fam!.ConwayPolCoeffs[d] (together with fam!.ConwayFldEltReducers[d]) in ffeconway.gi: this seems to be about optimizing, so flushable ?
  • CYCLICACHE in pcpgsperm.gi: sparse flushable cache
  • Z_MOD_NZ[n] in zmodnz.gi: sparse weak cache

immediate integers, strings: sparse key set

These cases could be covered by a function similar to MemoizePosIntFunction, but using GAP records or atomic records as keys instead (which can be viewed as dictionaries whose keys are limited to immediate integers and strings).

All uses I know have positive integers as keys and are listed in the previous section.

two keys, both small positive integers

E.g. a prime and a degree. Could be implemented by nesting two uses of MemoizePosIntFunction but that would result in some weird looking code. So direct support would be nicer. We can of course also treat this as a special case of the next one (i.e. arbitrary keys).

  • GALOIS_FIELDS[p][d] in ffe.gi: currently a flushable cache, but I think it would make more sense to have it weak. The first key is a prime (-> store sparsely), the second a degree (both sparse and dense are fine); if pairs are used as key, store sparse

  • ABELIAN_NUMBER_FIELDS[2][n] in fldabnum.gi: currently flushable; indexed by pairs of positive integers (a,b), with the current code storing the first integer sparsely, the second dense.

  • APPROXROOTS[e][r+1] in polyrat.gi: should be flushable but currently isn't. Both keys are positive integers. Limited to to e<=10, r+1<=101 i.e. at most 1010 values are cached -> could also implement that with a single MemoizePosIntFunction with input e + r*10

  • IRR_POLS_OVER_GF_CACHE[q][n] in upolyirr.gi: currently not flushable, but I think it should be. Roughtly comparable to GALOIS_FIELDS?!

arbitrary objects as keys

E.g. we have case were families or even tuples of families are the keys.
In that case, one may have to resort to a simple list of keys, and then search through it linearly.
However, since families are actually "unique" objects, we could probably use an ObjMap (or, until we get a thread safe version, also a record or atomic record with the MASTER_POINTER_NUMBER as key, or whatever else we have under "immediate integers").
Similarly, for tuples-of-families, one could define a hash function by mixing the master pointers of the families together, etc.
(But still, there might be cases we have to resort to a linear search).

  • QuaternionAlgebraData in algsc.gi: currently implemented as a flushable list using a linear searching; but I think that's wrong, and it should be a weak cache instead
  • DIRECT_PRODUCT_ELEMENT_FAMILIES in tuples.gi: a weak cache; maps tuples of families to a family object, stored as a single weak pointer list; the images "contain / reference" the keys, so there is no need to store them separately; this also avoids hassles with memory management
  • FamiliesOfGeneralMappingsAndRanges() in mapping.gi: a weak cache mapping families to families (see also PR Simplify caching logic in GeneralMappingsFamily #1888 which makes the code very similar to that in tuples.gi).

Miscellaneous

LRU caches

One cache featured not mentioned above are LRU caches. They are usually flushable, and their size is limited; they only keep the last N results. This is e.g. used to cache the last 1 result of GF(p), so that code which uses GF(p) in a tight loop is efficient: see GFCACHE in ffe.gi.

Criticism of the approach MemoizePosIntFunction takes

MemoizePosIntFunction works by taking a function (which maps positive integers to arbitrary results) by wrapping it into a closure. That closure also contains the actual cache. Which is pretty elegant (I like it!), but has one practical drawback: It's virtually impossible to inspect the state of the cache, e.g. to find out whether a given entry in the cache has already been computed, or e.g. how much memory the cache takes up.

I am not sure whether these are serious concerns; so far, I have not felt a strong need to perform either on any of the places we use it for. But I could imagine that e.g. during debugging, this could become a problem. Anyway, I merely mention it here to draw attention to this, and to invite further comments.

Extending MemoizePosIntFunction

Of course we could (and probably should) extend MemoizePosIntFunction to optionally support weak caches, and also to optionally support sparse key sets (as opposed to the current situation, where the key set is always dense.

It would also be useful to add a limit option to MemoizePosIntFunction, so that it rejects inputs above that limit.

@fingolfin fingolfin added kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: HPC-GAP unification request for comments labels Nov 10, 2017
@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Nov 10, 2017

One big issue is locking.

One reason for MemoizePosIntFunction's current design is it can be implemented without locks.

It could be extended to support sparse structures and strings with records.

Beyond that, you need a lock, and then decide where that lock should go. You want to ensure anything calling the cache won't take a lower lock, and also any function the cache calls won't take a higher lock. It wasn't obvious to me where to put that lock.

@ChrisJefferson
Copy link
Contributor

Another consideration -- as these caches are design for use deep in the library, we can't make them operations or methods and use nice dispatch to choose a "nice" function.

@fingolfin
Copy link
Member Author

fingolfin commented Nov 10, 2017

Yes there are multiple things I did not (yet) discuss, including: implementation strategies (including the issues you mention); interface considerations; key-dependant attributes (another special kind of caches) and more).

As to our cache code not being able to use (or implement) operations and methods: that's perfectly fine by me; in fact I kind of prefer to implement variations via either option records, or multiple functions with clearly distinct names.

@fingolfin
Copy link
Member Author

Regarding locking: of course were possible we'll try to use atomic data structures. But since these are low level caches for use in the library (or by package authors who know what they are doing), I think locks are not completely out of the question; we "just" have to be very careful about how use them. But of course we agree that this is tricky.

So, I'd start by focusing on things we can do with atomic datastructures. Some thoughts on that:

  • it would be nice to have an "atomic record" variant of MemoizePosIntFunction; could also be a new MemoizeSmallIntOrStrinFuncton, whatever. With that, we can already do quite a bit.
  • it would useful to have an "atomic weak pointer object". I wonder how hard that would be -- hard to judge for me, since I have no idea at all about how weak references work with Boehm GC.
  • we should look into using BindOnceExpr -- right now it only works for atomic and non-atomic positional objects; it would be highly interesting to extend it to plists (should be trivial), atomic lists, and atomic records & atomic positional objects. Indeed, if that worked, I'd consider backporting it to GAP; it seems more useful than GetWithDefault to me.
  • I wonder whether an atomic OBJ_MAP would be feasible / reasonable...
  • baring that, I'd investigate OBJ_MAP plus locks for a few of the above use cases...
  • one more idea: for those examples were the keys are "small" primes, we could use a "perfect hash": extend Primes to contain all primes <= 2^16 (would perhaps help other computations, too). Then, use that to map those primes to a dense key range. Not sure if that's useful (performance and effort wise), compared to e.g. an atomic record; just a thought

@sebasguts
Copy link
Member

Sorry for being late to the show, I will just give a quick overview of what we do in ToolsForHomalg:

  • We have caches using arbitrary objects as keys, since we only need those in CAP. The Number of keys is fixed for each cache.
  • We also offer the convenience for wrapping functions with caches, in ToolsForHomalg this method is called InstallMethodWithCache. It is possible to provide the cache object itself as an option to the method, to it is possible to also store it somewhere else and inspect it.
  • Our caches can be weak (compatibility) or strong (sometimes compatibility, sometimes performance). Since it CAP results of different methods can be entangled in a way, sometimes even the strong caching is necessary for the compatibility of the result.

I would be happy to work on this and export more of ToolsForHomalg's content to the GAP library itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: discussion discussions, questions, requests for comments, and so on kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: HPC-GAP unification
Projects
None yet
Development

No branches or pull requests

5 participants