From c9bc5856c3f2d3df8ccd62d98ea896d8675351a7 Mon Sep 17 00:00:00 2001 From: "Sean T. Allen" Date: Thu, 17 Sep 2020 22:46:24 -0400 Subject: [PATCH] Improvements to garbage collection of short-lived actors We have a few example programs that create a large number of short-lived actors that can exhibit runaway memory growth. The changes in this commit greatly reduce the memory growth or potentially reduce it to be stable. The primary change here is to use some static analysis at compilation time to determine if an actor is "an orphan". That is, upon creation, has no actors (including it's creator) with references to it. In order to support the concept of an "orphan actor", additional changes had to be made to the cycle detector protocol and how other actors interact with it. There are two primary things we can do with an orphan actor to improve the rate of garbage collection when the cycle detector is running. When the cycle detector is running actors are collected by the cycle detector when they have no other actors referring to them and can't receive any new messages or are in a cycle that can't receive additional messages. The examples listed in this PR all had problematic memory usage because, actors were created at a rate that overwhelmed the cycle detector and caused large amount of messages to it to back up and sit in its queue waiting to be processed. This commit addresses the "too many messages" problem by making the following actor GC rule changes for orphan actors. Note, all these rules are when the cycle detector is running. * when an actor finishes being run by a scheduler thread "locally delete" it if: * no other actors hold a reference to the actor * the cycle detector isn't holding a reference to the actor "locally deleting" follows the same message free protocol used when the cycle detector isn't running. * when an actor finishes being run by a scheduler thread, pre-emptively inform the cycle detector that we are blocked if: * no other actors hold a reference to the actor * the cycle detector is holding a reference to the actor by pre-emptively informing the cycle detector that we are blocked, the cycle detector can delete the actor more quickly. This "pre-emptively" notify change was introduced in #3649. In order to support these changes, additional cycle detector protocol changes were made. Previously, every actor upon creation informed the cycle detector of its existence. If we want to allow for local deleting, then this won't work. Every actor was known by the cycle detector. All concept of "actor created" messages have been removed by this change. Instead, actors have a flag FLAG_CD_CONTACTED that is set if the actor has ever sent a message to the cycle detector. Once the flag is set, we know that locally deleting the actor is unsafe and we can fall back on the slower pre-emptively inform strategy. The cycle detector works by periodically sending messages to all actors it knows about and asking them if they are currently blocked as the first step in its "path to actor garbage collection" protocol. As actors no longer inform the cycle detector of their existence on creation, the cycle detector needs a new way to discover actors. The first time an actor becomes logically blocked, it sets itself as blocked and notifies the cycle detector of its existence and that it is blocked. This is done by: * setting the actor as blocked * sending an "i'm blocked" message to the cycle detector * setting FLAG_CD_CONTACTED on the actor The overall effect of these changes is: - The rate of garbage collection for short-lived actors is improved. - Fewer cycle detector related messages are generated. It should be noted that if an actor is incorrectly identified by the compiler as being an orphan when it is in fact not, the end result would be a segfault. During development of this feature, it was discovered that the following code from the Promise select test would segfault: ```pony let pb = Promise[String] .> next[None]({(s) => h.complete_action(s) }) ``` The issue was that .> wasn't being correctly handled by the `is_result_needed` logic in `expr.c`. It is possibly that other logic within the function is faulty and if segfaults are see after this commit that didn't exist prior to it, then incorrectly identifying an actor as an orphan might be the culprit. Prior to these change, examples such as the one from issue #1007 (listed later) would be unable to be collected due to an edge-case in the cycle detector and the runtime garbage collection algorithms. Issue #1007 was opened with the following code having explosive memory growth: ```pony primitive N fun apply(): U64 => 2_000_000_000 actor Test new create(n: U64) => if n == 0 then return end Test(n - 1) actor Main new create(env: Env) => Test(N()) ``` Dipin and I previously worked on #3649 which added "pre-emptively informing" as detailed earlier. The example above from #1007 wasn't addressed by #3649 because the key to #3649 was that when done running its behaviors, an actor can see if it has no additional messages AND no references to itself and can then tell the cycle detector to skip parts of the CD protocol and garbage collect sooner. #1007 requires the "local delete" functionality from this commit whereas #3649 only provided "pre-emptive informing". The addition of "local delete" allows for the cycle detector to keep up with many cases of generating large amounts of "orphaned" actors. The from #1007 above wasn't addressed by #3649 because of an implementation detail in the ORCA garbage collection protocol; at the time that an instance of Test is done running its create behavior, it doesn't have a reference count of 0. It has a reference count of 256. Not because there are 256 references but because, when an actor is created puts a "fake value" in the rc value such that an actor isn't gc'd prematurely. The value will be set to the correct value once the actor that created it is first traced and will be subsequently updated correctly per ORCA going forward. However, at the time that an instance of Test is finished running its create, that information isn't available. It would be incorrect to say "if rc is 256, I'm blocked and you can gc me". 256 is a perfectly reasonable value for an rc to be in normal usage. This isn't a problem with the changes in this PR as the compiler detects that each instance of Test will be an orphan and immediately sets its rc to 0. This allows it to be garbage collected as the instance's message queue is empty so long as it's rc remains 0. Any changes in the future to address lingering issues with creating large numbers of orphaned actors should also be tested with the following examples. Each exercises a different type of pattern that could lead to memory growth or incorrect execution. Example 2 features reasonably stable memory usage that I have seen from time-to-time, increase rapidly. It should be noted that such an increase is rather infrequent but suggests there are additional problems in the cycle-detector. I suspect said problem is a periodic burst of additional messages to the cycle-detector from actors that can be garbage collected, but I haven't investigated further. ```pony actor Main new create(e: Env) => Spawner.run() actor Spawner var _living_children: U64 = 0 new create() => None be run() => _living_children = _living_children + 1 Spawnee(this).run() be collect() => _living_children = _living_children - 1 run() actor Spawnee let _parent: Spawner new create(parent: Spawner) => _parent = parent be run() => _parent.collect() ``` Example 3 has stable memory growth and given that it won't result in any messages being sent to the cycle detector as we have determined at compile-time that the Foo actor instances are orphaned. ```pony actor Main var n: U64 = 2_000_000_000 new create(e: Env) => run() be run() => while(n > 0 ) do Foo(n) n = n - 1 if ((n % 1_000) == 0) then run() break end end actor Foo new create(n: U64) => if ((n % 1_000_000) == 0) then @printf[I32]("%ld\n".cstring(), n) end None ``` Example 4 has the same characteristics as example 3 with the code as of this commit. However, it did exhibit different behavior prior to this commit being fully completed and appears to be a good test candidate for any future changes. ```pony actor Main var n: U64 = 2_000_000_000 new create(e: Env) => run() be run() => while(n > 0 ) do Foo(n) n = n - 1 if ((n % 1_000_000) == 0) then @printf[I32]("%ld\n".cstring(), n) end if ((n % 1_000) == 0) then run() break end end actor Foo new create(n: U64) => None ``` Finally, for anyone who works on improving this in the future, here's an additional test program beyond ones that already exist elsewhere for testing pony programs. This program will create a large number of actors that are orphaned but then send themselves to another actor. This should increase their rc count and keep them from being garbage collected. If the program segfaults, then something has gone wrong and the logic related to orphan actors has been broken. The example currently passes as of this commit. ```pony use "collections" actor Holder let _holding: SetIs[ManyOfMe] = _holding.create() new create() => None be hold(a: ManyOfMe) => _holding.set(a) be ping() => var c: U64 = 1_000 for h in _holding.values() do if c > 0 then h.ping() c = c - 1 end end actor ManyOfMe new create(h: Holder) => h.hold(this) be ping() => None actor Main var n: U64 = 500_000 let holder: Holder new create(env: Env) => holder = Holder run() be run() => while(n > 0 ) do ManyOfMe(holder) n = n - 1 if ((n % 1_000) == 0) then @printf[I32]("%ld\n".cstring(), n) run() holder.ping() break end end ``` The code and ideas in this commit, while attributed to me is the result of a group effort featuring ideas and/or code from myself, Joe Eli McIlvain, Dipin Hora, and Sylvan Clebsch. Closes #1007 --- .release-notes/3653.md | 217 ++++++++++++++++++ .../dtrace/mbox-size-all-actor-messages.d | 1 - examples/dtrace/telemetry.d | 6 - src/libponyc/codegen/codegen.c | 5 +- src/libponyc/codegen/gencall.c | 20 +- src/libponyc/codegen/gencall.h | 4 +- src/libponyc/codegen/genexe.c | 7 +- src/libponyc/codegen/genfun.c | 4 +- src/libponyc/pass/expr.c | 5 + src/libponyrt/actor/actor.c | 99 +++++--- src/libponyrt/actor/actor.h | 3 +- src/libponyrt/gc/cycle.c | 25 +- src/libponyrt/gc/cycle.h | 2 - src/libponyrt/pony.h | 2 +- 14 files changed, 313 insertions(+), 87 deletions(-) create mode 100644 .release-notes/3653.md diff --git a/.release-notes/3653.md b/.release-notes/3653.md new file mode 100644 index 0000000000..4c051a3832 --- /dev/null +++ b/.release-notes/3653.md @@ -0,0 +1,217 @@ +## Improvement to garbage collection for short-lived actors + +We have a few example programs that create a large number of short-lived actors that can exhibit runaway memory growth. This update greatly reduces the memory growth or potentially reduce it to be stable. + +The primary change here is to use some static analysis at compilation time to determine if an actor is "an orphan". That is, upon creation, has no actors (including it's creator) with references to it. + +In order to support the concept of an "orphan actor", additional changes had to be made to the cycle detector protocol and how other actors interact with it. There are two primary things we can do with an orphan actor to improve the rate of garbage collection when the cycle detector is running. + +When the cycle detector is running actors are collected by the cycle detector when they have no other actors referring to them and can't receive any new messages or are in a cycle that can't receive additional messages. + +The examples listed in this PR all had problematic memory usage because, actors were created at a rate that overwhelmed the cycle detector and caused large amount of messages to it to back up and sit in its queue waiting to be processed. + +This update addresses the "too many messages" problem by making the following actor GC rule changes for orphan actors. Note, all these rules are when the cycle detector is running. + +* when an actor finishes being run by a scheduler thread "locally delete" it if: + + * no other actors hold a reference to the actor + * the cycle detector isn't holding a reference to the actor + * its queue is empty + + "locally deleting" follows the same message free protocol used when the cycle detector isn't running. + +* when an actor finishes being run by a scheduler thread, preemptively inform the + cycle detector that we are blocked if: + + * no other actors hold a reference to the actor + * the cycle detector is holding a reference to the actor + * its queue is empty + + by preemptively informing the cycle detector that we are blocked, the cycle detector can delete the actor more quickly. This "preemptively" notify change was introduced in [#3649](https://github.com/ponylang/ponyc/pull/3649). + +In order to support these changes, additional cycle detector protocol changes were made. Previously, every actor upon creation informed the cycle detector of its existence. If we want to allow for local deleting, then this won't work. Every actor was known by the cycle detector. All concept of "actor created" messages have been removed by this change. Instead, actors have a flag FLAG_CD_CONTACTED that is set if the actor has ever sent a message to the cycle detector. Once the flag is set, we know that locally deleting the actor is unsafe and we can fall back on the slower preemptively inform strategy. + +The cycle detector works by periodically sending messages to all actors it knows about and asking them if they are currently blocked as the first step in its "path to actor garbage collection" protocol. As actors no longer inform the cycle detector of their existence on creation, the cycle detector needs a new way to discover actors. + +The first time an actor becomes logically blocked, it sets itself as blocked and notifies the cycle detector of its existence and that it is blocked. This is done by: + +* setting the actor as blocked +* sending an "i'm blocked" message to the cycle detector +* setting FLAG_CD_CONTACTED on the actor + +The overall effect of these changes is: + +- The rate of garbage collection for short-lived actors is improved. +- Fewer cycle detector related messages are generated. + +It should be noted that if an actor is incorrectly identified by the compiler as being an orphan when it is in fact not, the end result would be a segfault. During development of this feature, it was discovered that the following code from the Promise select test would segfault: + +```pony +let pb = Promise[String] .> next[None]({(s) => h.complete_action(s) }) +``` + +The issue was that .> wasn't being correctly handled by the `is_result_needed` logic in `expr.c`. It is possibly that other logic within the function is faulty and if segfaults are see after this change that didn't exist prior to it, then incorrectly identifying an actor as an orphan might be the culprit. + +Prior to these change, examples such as the one from issue [#007](https://github.com/ponylang/ponyc/issues/1007) (listed later) would be unable to be collected due to an edge-case in the cycle detector and the runtime garbage collection algorithms. + +Issue [#007](https://github.com/ponylang/ponyc/issues/1007) was opened with the following code having explosive memory growth: + +```pony +primitive N fun apply(): U64 => 2_000_000_000 + +actor Test + new create(n: U64) => + if n == 0 then return end + Test(n - 1) + +actor Main + new create(env: Env) => + Test(N()) +``` + +A previous PR [#3649](https://github.com/ponylang/ponyc/pull/3649) partially addressed some memory growth issues by adding the "preemptively informing" that detailed earlier. The example above from [#007](https://github.com/ponylang/ponyc/issues/1007) wasn't addressed by [#3649](https://github.com/ponylang/ponyc/pull/3649) because the key to [#3649](https://github.com/ponylang/ponyc/pull/3649) was that when done running its behaviors, an actor can see if it has no additional messages AND no references to itself and can then tell the cycle detector to skip parts of the CD protocol and garbage collect sooner. [#007](https://github.com/ponylang/ponyc/issues/1007) requires the "local delete" functionality from this change whereas [#3649](https://github.com/ponylang/ponyc/pull/3649) only provided "pre-emptive informing". + +The addition of "local delete" allows for the cycle detector to keep up with many cases of generating large amounts of "orphaned" actors. The from [#007](https://github.com/ponylang/ponyc/issues/1007) above wasn't addressed by [#3649](https://github.com/ponylang/ponyc/pull/3649) because of an implementation detail in the ORCA garbage collection protocol; at the time that an instance of Test is done running its create behavior, it doesn't have a reference count of 0. It has a reference count of 256. Not because there are 256 references but because, when an actor is created puts a "fake value" in the rc value such that an actor isn't garbage collected prematurely. The value will be set to the correct value once the actor that created it is first traced and will be subsequently updated correctly per ORCA going forward. + +However, at the time that an instance of Test is finished running its create, that information isn't available. It would be incorrect to say "if rc is 256, I'm blocked and you can gc me". 256 is a perfectly reasonable value for an rc to be in normal usage. + +This isn't a problem with the changes in this PR as the compiler detects that each instance of Test will be an orphan and immediately sets its rc to 0. This allows it to be garbage collected as the instance's message queue is empty so long as it's rc remains 0. + +Any changes in the future to address lingering issues with creating large numbers of orphaned actors should also be tested with the following examples. Each exercises a different type of pattern that could lead to memory growth or incorrect execution. + +Example 2 features reasonably stable memory usage that I have seen from time-to-time, increase rapidly. It should be noted that such an increase is rather infrequent but suggests there are additional problems in the cycle-detector. I suspect said problem is a periodic burst of additional messages to the cycle-detector from actors that can be garbage collected, but I haven't investigated further. + +```pony +actor Main + new create(e: Env) => + Spawner.run() + +actor Spawner + var _living_children: U64 = 0 + + new create() => + None + + be run() => + _living_children = _living_children + 1 + Spawnee(this).run() + + be collect() => + _living_children = _living_children - 1 + run() + +actor Spawnee + let _parent: Spawner + + new create(parent: Spawner) => + _parent = parent + + be run() => + _parent.collect() +``` + +Example 3 has stable memory growth and given that it won't result in any messages being sent to the cycle detector as we have determined at compile-time that the Foo actor instances are orphaned. + +```pony +actor Main + var n: U64 = 2_000_000_000 + + new create(e: Env) => + run() + + be run() => + while(n > 0 ) do + Foo(n) + n = n - 1 + if ((n % 1_000) == 0) then + run() + break + end + end + +actor Foo + new create(n: U64) => + if ((n % 1_000_000) == 0) then + @printf[I32]("%ld\n".cstring(), n) + end + + None +``` + +Example 4 has the same characteristics as example 3 with the code as of this change. However, it did exhibit different behavior prior to this change being fully completed and appears to be a good test candidate for any future changes. + +```pony +actor Main + var n: U64 = 2_000_000_000 + + new create(e: Env) => + run() + + be run() => + while(n > 0 ) do + Foo(n) + n = n - 1 + if ((n % 1_000_000) == 0) then + @printf[I32]("%ld\n".cstring(), n) + end + if ((n % 1_000) == 0) then + run() + break + end + end + +actor Foo + new create(n: U64) => + None +``` + +Finally, for anyone who works on improving this in the future, here's an additional test program beyond ones that already exist elsewhere for testing pony programs. This program will create a large number of actors that are orphaned but then send themselves to another actor. This should increase their rc count and keep them from being garbage collected. If the program segfaults, then something has gone wrong and the logic related to orphan actors has been broken. The example currently passes as of this change. + +```pony +use "collections" + +actor Holder + let _holding: SetIs[ManyOfMe] = _holding.create() + + new create() => + None + + be hold(a: ManyOfMe) => + _holding.set(a) + + be ping() => + var c: U64 = 1_000 + for h in _holding.values() do + if c > 0 then + h.ping() + c = c - 1 + end + end + +actor ManyOfMe + new create(h: Holder) => + h.hold(this) + + be ping() => + None + +actor Main + var n: U64 = 500_000 + let holder: Holder + + new create(env: Env) => + holder = Holder + run() + + be run() => + while(n > 0 ) do + ManyOfMe(holder) + n = n - 1 + if ((n % 1_000) == 0) then + @printf[I32]("%ld\n".cstring(), n) + run() + holder.ping() + break + end + end +``` diff --git a/examples/dtrace/mbox-size-all-actor-messages.d b/examples/dtrace/mbox-size-all-actor-messages.d index 999f674237..06bb5ad25d 100755 --- a/examples/dtrace/mbox-size-all-actor-messages.d +++ b/examples/dtrace/mbox-size-all-actor-messages.d @@ -14,7 +14,6 @@ inline unsigned int UINT32_MAX = 4294967295; inline unsigned int ACTORMSG_APPLICATION_START = (UINT32_MAX - 11); /* -12 */ inline unsigned int ACTORMSG_CHECKBLOCKED = (UINT32_MAX - 10); /* -11 */ inline unsigned int ACTORMSG_DESTROYED = (UINT32_MAX - 9); /* -10 */ -inline unsigned int ACTORMSG_CREATED = (UINT32_MAX - 8); /* -9 */ inline unsigned int ACTORMSG_ISBLOCKED = (UINT32_MAX - 7); /* -8 */ inline unsigned int ACTORMSG_BLOCK = (UINT32_MAX - 6); /* -7 */ inline unsigned int ACTORMSG_UNBLOCK = (UINT32_MAX - 5); /* -6 */ diff --git a/examples/dtrace/telemetry.d b/examples/dtrace/telemetry.d index 06a72d2fef..b29d2b9d97 100755 --- a/examples/dtrace/telemetry.d +++ b/examples/dtrace/telemetry.d @@ -33,12 +33,6 @@ pony$target:::actor-msg-send @counts[arg0, "Destroyed Messages Sent"] = count(); } -pony$target:::actor-msg-send -/ (unsigned int)arg1 == (unsigned int)ACTORMSG_CREATED / -{ - @counts[arg0, "Created Messages Sent"] = count(); -} - pony$target:::actor-msg-send / (unsigned int)arg1 == (unsigned int)ACTORMSG_ISBLOCKED / { diff --git a/src/libponyc/codegen/codegen.c b/src/libponyc/codegen/codegen.c index 6d0224144a..cabb8384aa 100644 --- a/src/libponyc/codegen/codegen.c +++ b/src/libponyc/codegen/codegen.c @@ -290,10 +290,11 @@ static void init_runtime(compile_t* c) LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, nounwind_attr); LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, readnone_attr); - // __object* pony_create(i8*, __Desc*) + // __object* pony_create(i8*, __Desc*, i1) params[0] = c->void_ptr; params[1] = c->descriptor_ptr; - type = LLVMFunctionType(c->object_ptr, params, 2, false); + params[2] = c->i1; + type = LLVMFunctionType(c->object_ptr, params, 3, false); value = LLVMAddFunction(c->module, "pony_create", type); LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, nounwind_attr); diff --git a/src/libponyc/codegen/gencall.c b/src/libponyc/codegen/gencall.c index 19d44868e4..9280531d57 100644 --- a/src/libponyc/codegen/gencall.c +++ b/src/libponyc/codegen/gencall.c @@ -11,6 +11,7 @@ #include "../type/cap.h" #include "../type/subtype.h" #include "../ast/stringtab.h" +#include "../pass/expr.h" #include "../../libponyrt/mem/pool.h" #include "../../libponyrt/mem/heap.h" #include "ponyassert.h" @@ -422,7 +423,7 @@ static LLVMValueRef gen_constructor_receiver(compile_t* c, reach_type_t* t, set_descriptor(c, t, receiver); return receiver; } else { - return gencall_alloc(c, t); + return gencall_alloc(c, t, call); } } @@ -1310,19 +1311,26 @@ LLVMValueRef gencall_runtime(compile_t* c, const char *name, return LLVMBuildCall(c->builder, func, args, count, ret); } -LLVMValueRef gencall_create(compile_t* c, reach_type_t* t) +LLVMValueRef gencall_create(compile_t* c, reach_type_t* t, ast_t* call) { compile_type_t* c_t = (compile_type_t*)t->c_type; - LLVMValueRef args[2]; + // If it's statically known that the calling actor can't possibly capture a + // reference to the new actor, because the result value of the constructor + // call is discarded at the immediate syntax level, we can make certain + // optimizations related to the actor reference count and the cycle detector. + bool no_inc_rc = call && !is_result_needed(call); + + LLVMValueRef args[3]; args[0] = codegen_ctx(c); args[1] = LLVMConstBitCast(c_t->desc, c->descriptor_ptr); + args[2] = LLVMConstInt(c->i1, no_inc_rc ? 1 : 0, false); - LLVMValueRef result = gencall_runtime(c, "pony_create", args, 2, ""); + LLVMValueRef result = gencall_runtime(c, "pony_create", args, 3, ""); return LLVMBuildBitCast(c->builder, result, c_t->use_type, ""); } -LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t) +LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t, ast_t* call) { compile_type_t* c_t = (compile_type_t*)t->c_type; @@ -1339,7 +1347,7 @@ LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t) return c_t->instance; if(t->underlying == TK_ACTOR) - return gencall_create(c, t); + return gencall_create(c, t, call); return gencall_allocstruct(c, t); } diff --git a/src/libponyc/codegen/gencall.h b/src/libponyc/codegen/gencall.h index 9a157aa535..c4a8f7b8d5 100644 --- a/src/libponyc/codegen/gencall.h +++ b/src/libponyc/codegen/gencall.h @@ -21,9 +21,9 @@ LLVMValueRef gen_ffi(compile_t* c, ast_t* ast); LLVMValueRef gencall_runtime(compile_t* c, const char *name, LLVMValueRef* args, int count, const char* ret); -LLVMValueRef gencall_create(compile_t* c, reach_type_t* t); +LLVMValueRef gencall_create(compile_t* c, reach_type_t* t, ast_t* call); -LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t); +LLVMValueRef gencall_alloc(compile_t* c, reach_type_t* t, ast_t* call); LLVMValueRef gencall_allocstruct(compile_t* c, reach_type_t* t); diff --git a/src/libponyc/codegen/genexe.c b/src/libponyc/codegen/genexe.c index f8060740c7..0178a09de2 100644 --- a/src/libponyc/codegen/genexe.c +++ b/src/libponyc/codegen/genexe.c @@ -23,11 +23,12 @@ static LLVMValueRef create_main(compile_t* c, reach_type_t* t, LLVMValueRef ctx) { // Create the main actor and become it. - LLVMValueRef args[2]; + LLVMValueRef args[3]; args[0] = ctx; args[1] = LLVMConstBitCast(((compile_type_t*)t->c_type)->desc, c->descriptor_ptr); - LLVMValueRef actor = gencall_runtime(c, "pony_create", args, 2, ""); + args[2] = LLVMConstInt(c->i1, 0, false); + LLVMValueRef actor = gencall_runtime(c, "pony_create", args, 3, ""); args[0] = ctx; args[1] = actor; @@ -128,7 +129,7 @@ LLVMValueRef gen_main(compile_t* c, reach_type_t* t_main, reach_type_t* t_env) reach_method_t* m = reach_method(t_env, TK_NONE, c->str__create, NULL); LLVMValueRef env_args[4]; - env_args[0] = gencall_alloc(c, t_env); + env_args[0] = gencall_alloc(c, t_env, NULL); env_args[1] = args[0]; env_args[2] = LLVMBuildBitCast(c->builder, args[1], c->void_ptr, ""); env_args[3] = LLVMBuildBitCast(c->builder, args[2], c->void_ptr, ""); diff --git a/src/libponyc/codegen/genfun.c b/src/libponyc/codegen/genfun.c index 69b995d8e9..ed6276f656 100644 --- a/src/libponyc/codegen/genfun.c +++ b/src/libponyc/codegen/genfun.c @@ -715,12 +715,12 @@ static bool genfun_allocator(compile_t* c, reach_type_t* t) case TK_STRUCT: case TK_CLASS: // Allocate the object or return the global instance. - result = gencall_alloc(c, t); + result = gencall_alloc(c, t, NULL); break; case TK_ACTOR: // Allocate the actor. - result = gencall_create(c, t); + result = gencall_create(c, t, NULL); break; default: diff --git a/src/libponyc/pass/expr.c b/src/libponyc/pass/expr.c index ab5d4b263d..ef62bb634b 100644 --- a/src/libponyc/pass/expr.c +++ b/src/libponyc/pass/expr.c @@ -106,6 +106,11 @@ bool is_result_needed(ast_t* ast) case TK_BECHAIN: case TK_FUNCHAIN: + // Result of the receiver expression is needed if the chain result is + // needed + if(ast_childidx(parent, 0) == ast) + return is_result_needed(parent); + // Result of a chained method isn't needed. return false; diff --git a/src/libponyrt/actor/actor.c b/src/libponyrt/actor/actor.c index 7d38c2727b..fe9156c34d 100644 --- a/src/libponyrt/actor/actor.c +++ b/src/libponyrt/actor/actor.c @@ -226,19 +226,6 @@ static bool handle_message(pony_ctx_t* ctx, pony_actor_t* actor, return false; } - case ACTORMSG_CREATED: - { -#ifdef USE_MEMTRACK_MESSAGES - ctx->mem_used_messages -= sizeof(pony_msgp_t); - ctx->mem_allocated_messages -= POOL_ALLOC_SIZE(pony_msgp_t); -#endif - - pony_assert(ponyint_is_cycle(actor)); - DTRACE3(ACTOR_MSG_RUN, (uintptr_t)ctx->scheduler, (uintptr_t)actor, msg->id); - actor->type->dispatch(ctx, actor, msg); - return false; - } - case ACTORMSG_DESTROYED: { #ifdef USE_MEMTRACK_MESSAGES @@ -420,36 +407,80 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, bool polling) if(!has_flag(actor, FLAG_BLOCKED | FLAG_SYSTEM | FLAG_BLOCKED_SENT)) { set_flag(actor, FLAG_BLOCKED); + + if(!actor_noblock + && (actor->gc.rc > 0) + && !has_flag(actor, FLAG_CD_CONTACTED) + ) + { + // The cycle detector (CD) doesn't know we exist so it won't try + // and reach out to us even though we're blocked, so send block message //// and set flag that the CD knows we exist now so that when we block + // in the future we will wait for the CD to reach out and ask + // if we're blocked or not. + // But, only if gc.rc > 0 because if gc.rc == 0 we are a zombie. + set_flag(actor, FLAG_BLOCKED_SENT); + set_flag(actor, FLAG_CD_CONTACTED); + ponyint_cycle_block(actor, &actor->gc); + } + } + // If we mark the queue as empty, then it is no longer safe to do any + // operations on this actor that aren't concurrency safe unless, the actor + // has an rc of 0 and the cycle detector isn't aware of the actor's + // existence. + // + // If there are other actors with references to this actor then once its + // queue is marked as empty, it can be rescheduled on another scheduler + // thread. "Other actors" includes the cycle detector if it is running, so... + // beware of any code in this function after marking the queue as empty. + // Getting it wrong will result in hard to track down segfaults. bool empty = ponyint_messageq_markempty(&actor->q); - if (!ponyint_is_cycle(actor) && empty && (actor->gc.rc == 0)) + + if (empty && (actor->gc.rc == 0) && has_flag(actor, FLAG_BLOCKED)) { - // Here, we know that: + // Here, we is what we know to be true: + // // - the actor has no messages in its queue // - there's no references to this actor + // // therefore the actor is a zombie and can be reaped. - if (actor_noblock) + // How reaping can occur depends on whether the cycle detector is running + // and it is, is it holding a reference to the actor. + // + // When 'actor_noblock` is true, the cycle detector isn't running. + // this means actors won't be garbage collected unless we take special + // action. Therefore if `noblock` is on, we should garbage collect the + // actor. + // + // When the cycle detector is running, "deleting locally" is still far + // more efficient and will free up memory sooner than letting the cycle + // detector do it. However, that is only safe to delete locally if the + // cycle detector isn't holding a reference to the actor. If we have never + // sent a reference to the cycle detector (FLAG_CD_CONTACTED), then this + // is perfectly safe to do. + if (actor_noblock || !has_flag(actor, FLAG_CD_CONTACTED)) { - // when 'actor_noblock` is true, the cycle detector isn't running. - // this means actors won't be garbage collected unless we take special - // action. - // therefore if `noblock` is on, we should garbage collect the actor. + // "Locally delete" the actor. ponyint_actor_setpendingdestroy(actor); ponyint_actor_final(ctx, actor); ponyint_actor_destroy(actor); } else { - // tell cycle detector that this actor is a zombie and will not get - // any more messages/work and can be reaped + // Tell cycle detector that this actor is a zombie and will not get + // any more messages/work and can be reaped. // Mark the actor as FLAG_BLOCKED_SENT and send a BLOCKED message // to speed up reaping otherwise waiting for the cycle detector - // to get around to asking if we're blocked would result in a - // huge memory leak - if(has_flag(actor, FLAG_BLOCKED) && !has_flag(actor, FLAG_BLOCKED_SENT)) + // to get around to asking if we're blocked could result in unnecessary + // memory growth. + if(!has_flag(actor, FLAG_BLOCKED_SENT)) { // We're blocked, send block message. + // This is concurrency safe because, only the cycle detector might + // have a reference to this actor (rc is 0) so another actor can not + // send it an application message that results this actor becoming + // unblocked (which would create a race condition). set_flag(actor, FLAG_BLOCKED_SENT); - pony_assert(ctx->current == actor); + set_flag(actor, FLAG_CD_CONTACTED); ponyint_cycle_block(actor, &actor->gc); } } @@ -555,7 +586,8 @@ bool ponyint_actor_getnoblock() return actor_noblock; } -PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type) +PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type, + bool orphaned) { pony_assert(type != NULL); @@ -573,12 +605,11 @@ PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type) ponyint_heap_init(&actor->heap); ponyint_gc_done(&actor->gc); - if(actor_noblock) - ponyint_actor_setsystem(actor); - - if(ctx->current != NULL) + if(ctx->current != NULL && !orphaned) { - // actors begin unblocked and referenced by the creating actor + // Do not set an rc if the actor is orphaned. The compiler determined that + // there are no references to this actor. By not setting a non-zero RC, we + // will GC the actor sooner and lower overall memory usage. actor->gc.rc = GC_INC_MORE; ponyint_gc_createactor(ctx->current, actor); } else { @@ -586,10 +617,6 @@ PONY_API pony_actor_t* pony_create(pony_ctx_t* ctx, pony_type_t* type) actor->gc.rc = 0; } - // tell the cycle detector we exist if block messages are enabled - if(!actor_noblock) - ponyint_cycle_actor_created(actor); - DTRACE2(ACTOR_ALLOC, (uintptr_t)ctx->scheduler, (uintptr_t)actor); return actor; } diff --git a/src/libponyrt/actor/actor.h b/src/libponyrt/actor/actor.h index 4a991eaef1..5fcd87a4c5 100644 --- a/src/libponyrt/actor/actor.h +++ b/src/libponyrt/actor/actor.h @@ -20,7 +20,6 @@ PONY_EXTERN_C_BEGIN #define ACTORMSG_APPLICATION_START (UINT32_MAX - 11) #define ACTORMSG_CHECKBLOCKED (UINT32_MAX - 10) #define ACTORMSG_DESTROYED (UINT32_MAX - 9) -#define ACTORMSG_CREATED (UINT32_MAX - 8) #define ACTORMSG_ISBLOCKED (UINT32_MAX - 7) #define ACTORMSG_BLOCK (UINT32_MAX - 6) #define ACTORMSG_UNBLOCK (UINT32_MAX - 5) @@ -51,7 +50,7 @@ enum FLAG_PENDINGDESTROY = 1 << 4, FLAG_OVERLOADED = 1 << 5, FLAG_UNDER_PRESSURE = 1 << 6, - FLAG_MUTED = 1 << 7, + FLAG_CD_CONTACTED = 1 << 7, }; bool has_flag(pony_actor_t* actor, uint8_t flag); diff --git a/src/libponyrt/gc/cycle.c b/src/libponyrt/gc/cycle.c index b628029fff..cb807732e2 100644 --- a/src/libponyrt/gc/cycle.c +++ b/src/libponyrt/gc/cycle.c @@ -686,12 +686,6 @@ static void check_blocked(pony_ctx_t* ctx, detector_t* d) deferred(ctx, d); } -static void actor_created(detector_t* d, pony_actor_t* actor) -{ - // get view (and add actor to view map at that time) - get_view(d, actor, true); -} - static void actor_destroyed(detector_t* d, pony_actor_t* actor) { // this would only called by a manual destroy of an actor @@ -1013,13 +1007,6 @@ static void cycle_dispatch(pony_ctx_t* ctx, pony_actor_t* self, break; } - case ACTORMSG_CREATED: - { - pony_msgp_t* m = (pony_msgp_t*)msg; - actor_created(d, (pony_actor_t*)m->p); - break; - } - case ACTORMSG_DESTROYED: { pony_msgp_t* m = (pony_msgp_t*)msg; @@ -1096,7 +1083,7 @@ void ponyint_cycle_create(pony_ctx_t* ctx, uint32_t detect_interval) detect_interval = 10; cycle_detector = NULL; - cycle_detector = pony_create(ctx, &cycle_type); + cycle_detector = pony_create(ctx, &cycle_type, false); ponyint_actor_setsystem(cycle_detector); detector_t* d = (detector_t*)cycle_detector; @@ -1131,16 +1118,6 @@ bool ponyint_cycle_check_blocked(uint64_t tsc, uint64_t tsc2) return false; } -void ponyint_cycle_actor_created(pony_actor_t* actor) -{ - // this will only be false during the creation of the cycle detector - // and after the runtime has been shut down - if(cycle_detector) { - pony_ctx_t* ctx = ponyint_sched_get_inject_context(); - pony_sendp(ctx, cycle_detector, ACTORMSG_CREATED, actor); - } -} - void ponyint_cycle_actor_destroyed(pony_actor_t* actor) { // this will only be false during the creation of the cycle detector diff --git a/src/libponyrt/gc/cycle.h b/src/libponyrt/gc/cycle.h index 3b6bea816a..0753fbd91c 100644 --- a/src/libponyrt/gc/cycle.h +++ b/src/libponyrt/gc/cycle.h @@ -13,8 +13,6 @@ void ponyint_cycle_create(pony_ctx_t* ctx, uint32_t detect_interval); bool ponyint_cycle_check_blocked(uint64_t tsc, uint64_t tsc2); -void ponyint_cycle_actor_created(pony_actor_t* actor); - void ponyint_cycle_actor_destroyed(pony_actor_t* actor); void ponyint_cycle_block(pony_actor_t* actor, gc_t* gc); diff --git a/src/libponyrt/pony.h b/src/libponyrt/pony.h index 008d25d567..7ac5619b84 100644 --- a/src/libponyrt/pony.h +++ b/src/libponyrt/pony.h @@ -217,7 +217,7 @@ PONY_API pony_ctx_t* pony_ctx(); * handles received messages. */ PONY_API ATTRIBUTE_MALLOC pony_actor_t* pony_create(pony_ctx_t* ctx, - pony_type_t* type); + pony_type_t* type, bool orphaned); /// Allocate a message and set up the header. The index is a POOL_INDEX. PONY_API pony_msg_t* pony_alloc_msg(uint32_t index, uint32_t id);