-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][RFC] Garbage Collection #3423
Comments
Thanks for bringing up this topic. We might need to consider the scenarios we want to use relay and design under those constraints. For now the main use cases is inference(maybe training in the future). I guess most of the objects allocated during inference will be Agree sharing GC. For AOT, should we learn how rust get rid of GC as a further optimization? |
Personally, I am not in favor of introducing GC to the system. Reference counting was fine and is great for handling external resources(GPU memories etc). Exactly for the same reason, languages like java/scala had a pretty bad time working with GPU memories(memories not being de-allocated precisely at the point where data structure goes out of scope). Ref counting has its limitation, but we can be mindful to avoid cycles and it works great for most of the cases we are looking at, without introducing additional problems of GC |
cc @hlu1 re: the leak from e.g. recursive functions taking a reference to itself in the environment. FWIW, can we solve these via adding the concept of weak references to the node system? It seems like in these cases that closures could use weak references to other closures, and have the higher-level env maintain the strong reference to the closure objects themselves? |
@ajtulloch indeed. However, it will always be possible that ppl might uncarefully create strong reference loop, and be unable to collect. |
@MarisaKirisame, #3448 was the leak @ajtulloch was referring to. |
I talked with Zach DeVito from PyTorch team for a while about RefCounting, there are quite a few benefits to using reference counting. We should probably just use weak refs, solutions from Counting Immutable Beans (a recent paper by my MSR collaborator where they do much better than GC languages in perf) or cycle detectors if we are worried about cycles. I think generally RC provides the best tradeoffs for ML right now. |
Seems we agreed that weak reference was better than gc. close this RFC thread for now. Thanks everyone who participated in the discussion |
right now relay aot/interpreter/vm all use reference counting to collect memory. However, it is possible to create cyclic data structure in relay, as demonstrated below:
data Loop = Ref (Optional Loop)
x : Loop = Ref None
match x with
| r -> r := x
end
in short, we can has a data type holding a mutable (nullable) reference, initialize it to null, then point it to itself.
This example is purely contrived, but it is completely possible in real, meaningful relay code: imagine a doubly linked list, or two closure referencing each other. They all form cyclic link data which will never be collected by reference counting.
there are three problem we should discuss:
0: what algorithm should we use? mark and sweep/generational, etc?
1: should we still use reference counting?
2: how can we implement the runtime only once, for aot/interpreter/vm, instead of each of them having to implement the runtime itself?
maybe we can look at https://github.com/hsutter/gcpp?
@jroesch @tqchen @icemelon9 @wweic @junrushao1994 @nhynes any suggestion?
The text was updated successfully, but these errors were encountered: