[Relay][RFC] Garbage Collection #3423

MarisaKirisame · 2019-06-23T22:25:22Z

right now relay aot/interpreter/vm all use reference counting to collect memory. However, it is possible to create cyclic data structure in relay, as demonstrated below:

data Loop = Ref (Optional Loop)
x : Loop = Ref None
match x with
| r -> r := x
end

in short, we can has a data type holding a mutable (nullable) reference, initialize it to null, then point it to itself.

This example is purely contrived, but it is completely possible in real, meaningful relay code: imagine a doubly linked list, or two closure referencing each other. They all form cyclic link data which will never be collected by reference counting.

there are three problem we should discuss:

0: what algorithm should we use? mark and sweep/generational, etc?

1: should we still use reference counting?

2: how can we implement the runtime only once, for aot/interpreter/vm, instead of each of them having to implement the runtime itself?

maybe we can look at https://github.com/hsutter/gcpp?

@jroesch @tqchen @icemelon9 @wweic @junrushao1994 @nhynes any suggestion?

wweic · 2019-06-24T01:17:50Z

Thanks for bringing up this topic. We might need to consider the scenarios we want to use relay and design under those constraints. For now the main use cases is inference(maybe training in the future). I guess most of the objects allocated during inference will be TensorObject, plus a few DatatypeObject, we can run some benchmarks for major models to confirm. TensorObject shouldn't cause cyclic reference, and if it's the majority of the objects, we can handle them with existing reference counting. And for the remaining DatatypeObject, a mark sweep GC can be called infrequently. But benchmark first. :-)

Agree sharing GC. For AOT, should we learn how rust get rid of GC as a further optimization?

tqchen · 2019-06-24T02:59:06Z

Personally, I am not in favor of introducing GC to the system. Reference counting was fine and is great for handling external resources(GPU memories etc). Exactly for the same reason, languages like java/scala had a pretty bad time working with GPU memories(memories not being de-allocated precisely at the point where data structure goes out of scope).

Ref counting has its limitation, but we can be mindful to avoid cycles and it works great for most of the cases we are looking at, without introducing additional problems of GC

ajtulloch · 2019-06-24T20:45:16Z

cc @hlu1 re: the leak from e.g. recursive functions taking a reference to itself in the environment.

FWIW, can we solve these via adding the concept of weak references to the node system? It seems like in these cases that closures could use weak references to other closures, and have the higher-level env maintain the strong reference to the closure objects themselves?

MarisaKirisame · 2019-06-24T21:15:54Z

@ajtulloch indeed. However, it will always be possible that ppl might uncarefully create strong reference loop, and be unable to collect.

hlu1 · 2019-06-27T22:58:39Z

@MarisaKirisame, #3448 was the leak @ajtulloch was referring to.

jroesch · 2019-07-01T21:54:30Z

I talked with Zach DeVito from PyTorch team for a while about RefCounting, there are quite a few benefits to using reference counting. We should probably just use weak refs, solutions from Counting Immutable Beans (a recent paper by my MSR collaborator where they do much better than GC languages in perf) or cycle detectors if we are worried about cycles. I think generally RC provides the best tradeoffs for ML right now.

tqchen · 2019-07-06T21:20:58Z

Seems we agreed that weak reference was better than gc. close this RFC thread for now. Thanks everyone who participated in the discussion

hlu1 mentioned this issue Jun 27, 2019

[Relay][Interpreter] Build self referential closure only when letrec is defined #3448

Merged

tqchen closed this as completed Jul 6, 2019

MarisaKirisame mentioned this issue Oct 17, 2019

[Relay][RFC] Resolve cyclic dependency in Closure #4143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][RFC] Garbage Collection #3423

[Relay][RFC] Garbage Collection #3423

MarisaKirisame commented Jun 23, 2019

wweic commented Jun 24, 2019

tqchen commented Jun 24, 2019 •

edited

Loading

ajtulloch commented Jun 24, 2019

MarisaKirisame commented Jun 24, 2019

hlu1 commented Jun 27, 2019

jroesch commented Jul 1, 2019

tqchen commented Jul 6, 2019

[Relay][RFC] Garbage Collection #3423

[Relay][RFC] Garbage Collection #3423

Comments

MarisaKirisame commented Jun 23, 2019

wweic commented Jun 24, 2019

tqchen commented Jun 24, 2019 • edited Loading

ajtulloch commented Jun 24, 2019

MarisaKirisame commented Jun 24, 2019

hlu1 commented Jun 27, 2019

jroesch commented Jul 1, 2019

tqchen commented Jul 6, 2019

tqchen commented Jun 24, 2019 •

edited

Loading