-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design a more general solution to memory managment #378
Labels
Comments
antocuni
added
enhancement
New feature or request
heavydb
Related to heavydb server
labels
Nov 11, 2021
antocuni
added a commit
that referenced
this issue
Dec 20, 2021
#380) - add rbc.stdlib.array_api, which rbc users can import to use the array-like API - the array api is no longer specific to omnisci, now it works also in normal @rjit functions. This makes it much easier to develop and test - introduce rbclib. This is the embryo of a "runtime library" to be called by the code generated by rbc. In particular, it provides rbclib_allocate_varlen_buffer, which can be used to allocate arrays for non-omnisci targets - store the names of the allocation/deallocation array functions in the TargetInfo. This makes it possible to call the appropriate allocate_varlen_buffer function depending on whether we are targeting omnisci or not - introduce the "debug allocator" which keeps track of allocations/deallocations and detect memory leaks (see e.g issue #378 )
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The current memory management design of rbc is prone to memory leaks and we should design a better way to deal with allocation of custom objects.
Unless I am mistaken, the current rbc type support is as follows:
omnisci_backend.Buffer
and its subclasses such asArray
.In the following I will refer only to
Array
, but obviously everything can be applied also to other current and future types which require memory allocation.Consider the following code:
The current strategy works as follows:
Array
and records the LLVMValue
where the result of the allocation is stored. Conceptually, it's equivalent as remembering that the variablesa
,b
, andc
must be eventually freedc
is returned to the caller and must survivefree(a)
andfree(b)
before returning(Note: this is the theory. At the moment of writing, it is buggy and doesn't work, see #377)
This strategy works only for very simple cases but it is fundamentally problematic and cannot be generalized to more complex cases. Solving this it's basically equivalent to the halting problem so we need a better way.
Normally, virtual machines solve the problem by using a runtime garbage collector and/or reference counting, but I don't think this is reasonable for our use case. Let me write down some ideas/proposals for how to solve the issue.
Proposal 0: improve the current algorithm
I call it proposal 0 because I don't think it's really an option, but I add it for completeness. We could try to improve our current algorithm to catch the most common uses cases. Maybe it is even possible to detect unsupported use cases and emit compile error in that cases.
Pros:
Cons:
Proposal 1: explicit memory management
We pretend that the language used to write user functions is Python, but actually it's a statically typed subset of it which is not far from C. So, one option is to stop pretending and let the users to explicit
free()
their memory. Something like this:If we want to be advanced we can even provide some syntax niceties (although I don't know how easy it is to add this functionality to numba):
Pros:
Cons:
@omnisci
decorator you suddenly need to manage your memory, which is unexpectedProposal 2: explicit ownership rules (a.k.a. "The Rust Way")
We could augment our typesytem with ownership rules. Rust has similar concepts.
Basically, the idea is that each
Array
can have one and only one owner which is responsible to deallocate it. When you pass the value around, you pass a borrowed reference, unless you explicitly give away the ownership. When you return the value, you are also returning the ownership. The typesytem can also statically check that you cannot free the object as long as there are borrowed references around.This is basically proposal 0 "done right", because it no longer depends on some heuristic but on typesystem rules which can be proven correct. To fully work, we need some syntax to transfer the ownership and possibly a way to free the object early (which basically means to transfer the ownership to nobody).
Also, some innocent-looking code will not work, e.g.:
The code above would be reject by the typesytem because the type of
b
has two different ownership rules in the two branches of the if.Pros:
Cons:
Proposal 3.1: free-it-all strategy (a.k.a. "poor man's GC")
The idea behind this proposal is that we are not interested in a general memory allocation strategy. What we really want is to be able to call UDF/UDTF without leaking memory, and we can do that with very well defined lifetime rules:
Array
s as they likeArray
to escape to the "external world" is be toreturn
ed to the callerArrays
are automatically destroyed when the function exitsThis means for example that user functions are not allowed to persist
Array
s between calls, for example by putting them in some global state. But I don't think that this is possible at all at the moment.One possible way to implement this is to associate all allocations to an explicit "memory manager" which you get from the outside:
In the example above, all the memory associated to
mgr
will be freed when the caller decides so, and it is impossible to allocate memory if you don't have one (because allocation functions expect it). If you need to call a function which internally allocates, you must pass it around explicitly, such as in the call toomnisci_np.zeros()
above.The idea is that the manager is created by
omniscidb
at the point in which it calls UDF/UDTF. Something like this (in some sort of pseudocode):@guilhermeleobas suggests that we can reuse the TableManager for that, since there is already a way to pass it to the UDTF.
Pros:
mgr
around everywhereMemoryManager
could decide to allocate a fixed chunk of RAM and keep a pointer to indicate how much it has been used;malloc()
will be implemented by just incrementing the pointer (and allocate a new chunk of RAM if we have reached the end of the current one), which is much faster than the stdlibmalloc()
. (NOTE: this is more or less how the PyPy GC allocates memory and it's one of the reasons it's so fast).Cons:
mgr
everywheremgr
to those which can possibly allocate memory (such aszeros
in the example above)Proposal 3.2: implicit free-it-all strategy
This is the same as 3.1, but we hack at the code generated by numba to implicitly pass
mgr
around. So for example, you callzeros(10, types.float64)
and the call automatically becomeszeros(mgr, 10, types.float64)
.Pros:
Cons:
mgr
, it is trivial to explain that the memory belongs to it. If it's a hidden implementation detail, suddenly the lifetime rules become magical and/or obscureProposal 3.3: global mgr
Same as Proposal 3.2, but instead of passing it around, we rely on a global (or thread-local) mgr to be reachable. I think this quickly leads to problems if we have concurrent and/or parallel and/or recursive calls, so it should just be avoided.
Conclusions
Personally, I think that the best is Proposal 3.1, "free-it-all strategy". The cons don't look too bad to me, and the pros are very important, especially the fact that it's easy to implement, which means less bugs.
The text was updated successfully, but these errors were encountered: