Skip to content

Commit

Permalink
Improve cycle detector handling of short-lived actors
Browse files Browse the repository at this point in the history
We have a few example programs that create a large number of short-lived
actors that can exhibit runaway memory growth. The changes in this commit
greatly reduce the memory growth or potentially reduce it to be stable.

The primary change here is to use some static analysis at compilation time
to determine if an actor is "an orphan". That is, upon creation, has no
actors (including it's creator) with references to it.

Prior to this change, examples such as the one from issue #1007 (listed
later) would be unable to be collected due to an edge-case in the cycle
detector and the runtime garbage collection algorithms.

Issue #1007 was opened with the following code having explosive memory growth:

```pony
primitive N fun apply(): U64 => 2_000_000_000

actor Test
  new create(n: U64) =>
    if n == 0 then return end
    Test(n - 1)

actor Main
  new create(env: Env) =>
    Test(N())
```

There are three additional examples that were partially addressed by #3647. #1007
wasn't addressed by #3647 because, the key to the previous fix that Dipin and I
came up with was that when done running its behaviors, an actor can see if it
has no additional messages AND no references to itself and can then tell the cycle
detector to skip parts of the CD protocol and garbage collect sooner.

The above change allows for the cycle detector to keep up with many cases of
generating large amounts of "orphaned" actors.

The example above wasn't addressed by #3647 because of an implementation detail
in the ORCA garbage collection protocol; at the time that an instance of Test is
done running its create behavior, it doesn't have a reference count of 0. It has
a reference count of 256. Not because there are 256 references but because, when
an actor is created puts a "fake value" in the rc value such that an actor isn't
gc'd prematurely. The value will be set to the correct value once the actor that
created it is first traced and will be subsequently updated correctly per ORCA
going forward.

However, at the time that an instance of Test is finished running its create, that
information isn't available. It would be incorrect to say "if rc is 256, I'm blocked
and you can gc me". 256 is a perfectly reasonable value for an rc to be in normal usage.

This isn't a problem with the changes in this PR as the compiler detects that each
instance of Test will be an orphan and immediately sets its rc to 0. This allows it
to be garbage collected based on the changed in #3647 as soon as the instance's
message queue is empty.

Any changes in the future to address lingering issues with creating large numbers
of orphaned actors should also be tested with the following two examples.

Example 2 features reasonably stable memory usage that I have seen from time-to-time,
increase rapidly. It should be noted that such an increase is rather infrequent but
suggests there are additional problems in the cycle-detector. I suspect said problem
is a periodic burst of additional messages to the cycle-detector from actors
that can be garbage collected, but I haven't investigated further.

```pony
ctor Main
  new create(e: Env) =>
    Spawner.run()

actor Spawner
  var _living_children: U64 = 0

  new create() =>
    None

  be run() =>
    _living_children = _living_children + 1
    Spawnee(this).run()

  be collect() =>
    _living_children = _living_children - 1
    run()

actor Spawnee
  let _parent: Spawner

  new create(parent: Spawner) =>
    _parent = parent

  be run() =>
    _parent.collect()
```

Example 3 has stable memory growth and given that it won't result in any messages
being sent to the cycle detector as we have determined at compile-time that the
Foo actor instances are orphaned.

```pony
actor Main
  var n: U64 = 2_000_000_000

  new create(e: Env) =>
    run()

  be run() =>
    while(n >= 0 ) do
      Foo(n)
      n = n - 1
      if ((n % 1_000) == 0) then
        run()
        break
      end
     end

actor Foo
  new create(n: U64) =>
    if ((n % 1_000_000) == 0) then
      @printf[I32]("%ld\n".cstring(), n)
    end

    None
```

Example 4 has the same characteristics as example 3 with the code as of this commit.
However, it did exhibit different behavior prior to this commit being fully completed
and appears to be a good test candidate for any future changes.

```pony
actor Main
  var n: U64 = 2_000_000_000

  new create(e: Env) =>
    run()

  be run() =>
    while(n >= 0 ) do
      Foo(n)
      n = n - 1
      if ((n % 1_000_000) == 0) then
        @printf[I32]("%ld\n".cstring(), n)
      end
      if ((n % 1_000) == 0) then
        run()
        break
      end
     end

actor Foo
  new create(n: U64) =>
    None
```

Closes #1007
  • Loading branch information
SeanTAllen committed Sep 18, 2020
1 parent f920092 commit d17cac2
Show file tree
Hide file tree
Showing 10 changed files with 57 additions and 30 deletions.
5 changes: 3 additions & 2 deletions src/libponyc/codegen/codegen.c
Original file line number Diff line number Diff line change
Expand Up @@ -290,10 +290,11 @@ static void init_runtime(compile_t* c)
LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, nounwind_attr);
LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, readnone_attr);

// __object* pony_create(i8*, __Desc*)
// __object* pony_create(i8*, __Desc*, i1)
params[0] = c->void_ptr;
params[1] = c->descriptor_ptr;
type = LLVMFunctionType(c->object_ptr, params, 2, false);
params[2] = c->i1;
type = LLVMFunctionType(c->object_ptr, params, 3, false);
value = LLVMAddFunction(c->module, "pony_create", type);

LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, nounwind_attr);
Expand Down
20 changes: 14 additions & 6 deletions src/libponyc/codegen/gencall.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "../type/cap.h"
#include "../type/subtype.h"
#include "../ast/stringtab.h"
#include "../pass/expr.h"
#include "../../libponyrt/mem/pool.h"
#include "../../libponyrt/mem/heap.h"
#include "ponyassert.h"
Expand Down Expand Up @@ -422,7 +423,7 @@ static LLVMValueRef gen_constructor_receiver(compile_t* c, reach_type_t* t,
set_descriptor(c, t, receiver);
return receiver;
} else {
return gencall_alloc(c, t);
return gencall_alloc(c, t, call);
}
}

Expand Down Expand Up @@ -1310,19 +1311,26 @@ LLVMValueRef gencall_runtime(compile_t* c, const char *name,
return LLVMBuildCall(c->builder, func, args, count, ret);
}

LLVMValueRef gencall_create(compile_t* c, reach_type_t* t)
LLVMValueRef gencall_create(compile_t* c, reach_type_t* t, ast_t* call)
{
compile_type_t* c_t = (compile_type_t*)t->c_type;

LLVMValueRef args[2];
// If it's statically known that the calling actor can't possibly capture a
// reference to the new actor, because the result value of the constructor
// call is discarded at the immediate syntax level, we can make certain
// optimizations related to the actor reference count and the cycle detector.
bool no_inc_rc = call && !is_result_needed(call);

LLVMValueRef args[3];
args[0] = codegen_ctx(c);
args[1] = LLVMConstBitCast(c_t->desc, c->descriptor_ptr);
args[2] = LLVMConstInt(c->i1, no_inc_rc ? 1 : 0, false);

LLVMValueRef result = gencall_runtime(c, "pony_create", args, 2, "");
LLVMValueRef result = gencall_runtime(c, "pony_create", args, 3, "");
return LLVMBuildBitCast(c->builder, result, c_t->use_type, "");
}

LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t)
LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t, ast_t* call)
{
compile_type_t* c_t = (compile_type_t*)t->c_type;

Expand All @@ -1339,7 +1347,7 @@ LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t)
return c_t->instance;

if(t->underlying == TK_ACTOR)
return gencall_create(c, t);
return gencall_create(c, t, call);

return gencall_allocstruct(c, t);
}
Expand Down
4 changes: 2 additions & 2 deletions src/libponyc/codegen/gencall.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ LLVMValueRef gen_ffi(compile_t* c, ast_t* ast);
LLVMValueRef gencall_runtime(compile_t* c, const char *name,
LLVMValueRef* args, int count, const char* ret);

LLVMValueRef gencall_create(compile_t* c, reach_type_t* t);
LLVMValueRef gencall_create(compile_t* c, reach_type_t* t, ast_t* call);

LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t);
LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t, ast_t* call);

LLVMValueRef gencall_allocstruct(compile_t* c, reach_type_t* t);

Expand Down
7 changes: 4 additions & 3 deletions src/libponyc/codegen/genexe.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,12 @@ static LLVMValueRef create_main(compile_t* c, reach_type_t* t,
LLVMValueRef ctx)
{
// Create the main actor and become it.
LLVMValueRef args[2];
LLVMValueRef args[3];
args[0] = ctx;
args[1] = LLVMConstBitCast(((compile_type_t*)t->c_type)->desc,
c->descriptor_ptr);
LLVMValueRef actor = gencall_runtime(c, "pony_create", args, 2, "");
args[2] = LLVMConstInt(c->i1, 0, false);
LLVMValueRef actor = gencall_runtime(c, "pony_create", args, 3, "");

args[0] = ctx;
args[1] = actor;
Expand Down Expand Up @@ -128,7 +129,7 @@ LLVMValueRef gen_main(compile_t* c, reach_type_t* t_main, reach_type_t* t_env)
reach_method_t* m = reach_method(t_env, TK_NONE, c->str__create, NULL);

LLVMValueRef env_args[4];
env_args[0] = gencall_alloc(c, t_env);
env_args[0] = gencall_alloc(c, t_env, NULL);
env_args[1] = args[0];
env_args[2] = LLVMBuildBitCast(c->builder, args[1], c->void_ptr, "");
env_args[3] = LLVMBuildBitCast(c->builder, args[2], c->void_ptr, "");
Expand Down
4 changes: 2 additions & 2 deletions src/libponyc/codegen/genfun.c
Original file line number Diff line number Diff line change
Expand Up @@ -715,12 +715,12 @@ static bool genfun_allocator(compile_t* c, reach_type_t* t)
case TK_STRUCT:
case TK_CLASS:
// Allocate the object or return the global instance.
result = gencall_alloc(c, t);
result = gencall_alloc(c, t, NULL);
break;

case TK_ACTOR:
// Allocate the actor.
result = gencall_create(c, t);
result = gencall_create(c, t, NULL);
break;

default:
Expand Down
22 changes: 14 additions & 8 deletions src/libponyrt/actor/actor.c
Original file line number Diff line number Diff line change
Expand Up @@ -423,13 +423,14 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, bool polling)
}

bool empty = ponyint_messageq_markempty(&actor->q);

if (!ponyint_is_cycle(actor) && empty && (actor->gc.rc == 0))
{
// Here, we know that:
// - the actor has no messages in its queue
// - there's no references to this actor
// therefore the actor is a zombie and can be reaped.
if (actor_noblock)
if (actor_noblock || has_flag(actor, FLAG_ORPHAN))
{
// when 'actor_noblock` is true, the cycle detector isn't running.
// this means actors won't be garbage collected unless we take special
Expand Down Expand Up @@ -555,7 +556,8 @@ bool ponyint_actor_getnoblock()
return actor_noblock;
}

PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type)
PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type,
bool orphaned)
{
pony_assert(type != NULL);

Expand All @@ -576,19 +578,23 @@ PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type)
if(actor_noblock)
ponyint_actor_setsystem(actor);

if(ctx->current != NULL)
if(ctx->current != NULL && !orphaned)
{
// actors begin unblocked and referenced by the creating actor
// Do not set an rc if the actor is orphaned. The compiler determined that
// there are no references to this actor. By not setting a non-zero RC, we
// will GC the actor sooner and lower overall memory usage.
actor->gc.rc = GC_INC_MORE;
ponyint_gc_createactor(ctx->current, actor);

if(!actor_noblock)
ponyint_cycle_actor_created(actor);
} else {
// no creator, so the actor isn't referenced by anything
actor->gc.rc = 0;
}

// tell the cycle detector we exist if block messages are enabled
if(!actor_noblock)
ponyint_cycle_actor_created(actor);
if (orphaned)
set_flag(actor, FLAG_ORPHAN);
}

DTRACE2(ACTOR_ALLOC, (uintptr_t)ctx->scheduler, (uintptr_t)actor);
return actor;
Expand Down
2 changes: 1 addition & 1 deletion src/libponyrt/actor/actor.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ enum
FLAG_PENDINGDESTROY = 1 << 4,
FLAG_OVERLOADED = 1 << 5,
FLAG_UNDER_PRESSURE = 1 << 6,
FLAG_MUTED = 1 << 7,
FLAG_ORPHAN = 1 << 7,
};

bool has_flag(pony_actor_t* actor, uint8_t flag);
Expand Down
12 changes: 7 additions & 5 deletions src/libponyrt/gc/cycle.c
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,11 @@ static void block(detector_t* d, pony_ctx_t* ctx, pony_actor_t* actor,
// - there's no references to this actor because rc == 0
// therefore the actor is a zombie and can be reaped.

// prep to destroy
ponyint_actor_setpendingdestroy(actor);
ponyint_actor_final(ctx, actor);
ponyint_actor_sendrelease(ctx, actor);

view_t* view = get_view(d, actor, false);

if (view != NULL)
Expand Down Expand Up @@ -747,10 +752,7 @@ static void block(detector_t* d, pony_ctx_t* ctx, pony_actor_t* actor,
ponyint_deltamap_free(map);
}

// invoke the actor's finalizer and destroy it
ponyint_actor_setpendingdestroy(actor);
ponyint_actor_final(ctx, actor);
ponyint_actor_sendrelease(ctx, actor);
// destroy the actor
ponyint_actor_destroy(actor);

d->destroyed++;
Expand Down Expand Up @@ -1096,7 +1098,7 @@ void ponyint_cycle_create(pony_ctx_t* ctx, uint32_t detect_interval)
detect_interval = 10;

cycle_detector = NULL;
cycle_detector = pony_create(ctx, &cycle_type);
cycle_detector = pony_create(ctx, &cycle_type, false);
ponyint_actor_setsystem(cycle_detector);

detector_t* d = (detector_t*)cycle_detector;
Expand Down
9 changes: 9 additions & 0 deletions src/libponyrt/gc/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,8 @@ void ponyint_gc_sendacquire(pony_ctx_t* ctx)

void ponyint_gc_sendrelease(pony_ctx_t* ctx, gc_t* gc)
{
ponyint_objectmap_sweep(&gc->local);

#ifdef USE_MEMTRACK
size_t objectmap_mem_used_freed = 0;
size_t objectmap_mem_allocated_freed = 0;
Expand All @@ -813,6 +815,13 @@ void ponyint_gc_sendrelease(pony_ctx_t* ctx, gc_t* gc)
#ifdef USE_MEMTRACK
gc->foreign_actormap_objectmap_mem_used -= objectmap_mem_used_freed;
gc->foreign_actormap_objectmap_mem_allocated -= objectmap_mem_allocated_freed;

pony_assert((ponyint_actormap_partial_mem_size(&gc->foreign)
+ gc->foreign_actormap_objectmap_mem_used)
== ponyint_actormap_total_mem_size(&gc->foreign));
pony_assert((ponyint_actormap_partial_alloc_size(&gc->foreign)
+ gc->foreign_actormap_objectmap_mem_allocated)
== ponyint_actormap_total_alloc_size(&gc->foreign));
#endif
}

Expand Down
2 changes: 1 addition & 1 deletion src/libponyrt/pony.h
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ PONY_API pony_ctx_t* pony_ctx();
* handles received messages.
*/
PONY_API ATTRIBUTE_MALLOC pony_actor_t* pony_create(pony_ctx_t* ctx,
pony_type_t* type);
pony_type_t* type, bool orphaned);

/// Allocate a message and set up the header. The index is a POOL_INDEX.
PONY_API pony_msg_t* pony_alloc_msg(uint32_t index, uint32_t id);
Expand Down

0 comments on commit d17cac2

Please sign in to comment.