Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor appears to stall during GC #1994

Closed
patternspandemic opened this issue Jun 30, 2017 · 9 comments
Closed

Actor appears to stall during GC #1994

patternspandemic opened this issue Jun 30, 2017 · 9 comments

Comments

@patternspandemic
Copy link
Contributor

For all details and code see this Gist.
Discussion on the User list.

  • The example essentially makes 10 deferred calls to the Worker actor's apply behavior, 1 call every second.
  • In my case, after 6 successful calls, the behavior is no longer run, however, the remaining messages appear to be sent to the actor.
  • Not reproducible on all machines/configurations, though reproduced on both Linux and OSX.

A profiling of the example program results in the following:

[pattern:~/pony/example] $ sudo perf record --call-graph fp -- ./example
Performing some work in a second..

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.
Worker processing instruction.
Created _WorkerPerform.
Performing work.
Found some params as Json

Sent worker instruction.

Sent worker instruction.

Sent worker instruction.

Sent worker instruction.

(>> 100% core usage, killed process after short while <<)
[ perf record: Woken up 64 times to write data ]
[ perf record: Captured and wrote 16.199 MB perf.data (231369 samples) ]

[pattern:~/pony/example] $ sudo perf report
Samples: 231K of event 'cycles:p', Event count (approx.): 87003176131
  Children      Self  Command     Shared Object        Symbol                                                                                                                                                     
-   98.85%    93.68%  example     example              [.] sweep_small.lto_priv.61                                                                                                                                
   - sweep_small.lto_priv.61                                                                                                                                                                                      
      - 5.07% __irqentry_text_start                                                                                                                                                                               
         - 5.05% smp_apic_timer_interrupt                                                                                                                                                                         
            - 4.93% local_apic_timer_interrupt                                                                                                                                                                    
               - 4.92% hrtimer_interrupt                                                                                                                                                                          
                  - 4.02% __hrtimer_run_queues                                                                                                                                                                    
                     - 3.98% tick_sched_timer                                                                                                                                                                     
                        - 3.66% tick_sched_handle.isra.12                                                                                                                                                         
                           - 3.66% update_process_times                                                                                                                                                           
                              - 3.54% scheduler_tick                                                                                                                                                              
                                 - 1.52% sched_clock_tick                                                                                                                                                         
                                    - 1.31% ktime_get                                                                                                                                                             
                                         read_hpet                                                                                                                                                                
                                   1.21% task_tick_fair                              
(...)

Nearly all time spent in sweep_small from ponyc/src/libponyrt/mem/heap.c

It looks to me like GC has hung up on the actor for some reason. Note that, in the example the HTTPClunk class mirror's HTTPClient's creation of an SSLContext in it's constructor. I've not been able to determine whether the SSL library has anything to do with this issue or not. For all I know it's inclusion simply makes GC of the actor occur faster. I don't have the experience or knowledge of Pony.

Pony is awesome.

@SeanTAllen
Copy link
Member

@patternspandemic are you able to use vtune or instruments to profile only once the stall has happened?

@patternspandemic
Copy link
Contributor Author

Connecting to the process after core usage reaches 100%:

$ sudo perf top -p 1724 --call-graph fp -d 6

Samples: 100K of event 'cycles:p', Event count (approx.): 33523750921
  Children      Self  Shared Object       Symbol                                                       
-   99.98%    97.85%  example          [.] sweep_small.lto_priv.61                                  
     98.58% sweep_small.lto_priv.61                                                                    
+    2.04%     0.02%  [kernel]            [k] __irqentry_text_start                                    
+    2.02%     0.00%  [kernel]            [k] smp_apic_timer_interrupt                                 
+    1.92%     0.00%  [kernel]            [k] local_apic_timer_interrupt                               
+    1.91%     0.01%  [kernel]            [k] hrtimer_interrupt                                        
+    1.48%     0.01%  [kernel]            [k] __hrtimer_run_queues                                     
+    1.46%     0.00%  [kernel]            [k] tick_sched_timer                                         
+    1.19%     0.00%  [kernel]            [k] tick_sched_handle.isra.12                                
+    1.13%     0.01%  [kernel]            [k] scheduler_tick                                           
+    1.12%     1.12%  [kernel]            [k] read_hpet                                                
+    0.82%     0.02%  [kernel]            [k] ktime_get                                                
     0.77%     0.00%  [kernel]            [k] update_process_times                                     
     0.54%     0.00%  [kernel]            [k] sched_clock_tick                                         
     0.38%     0.19%  [kernel]            [k] task_tick_fair                                           
     0.22%     0.01%  [kernel]            [k] clockevents_program_event                                
     0.15%     0.00%  [kernel]            [k] tick_do_update_jiffies64                                 
     0.14%     0.01%  [kernel]            [k] update_wall_time                                         
     0.10%     0.00%  [kernel]            [k] irq_exit                                                 
     0.10%     0.07%  [kernel]            [k] update_cfs_shares                                        
     0.10%     0.04%  [kernel]            [k] sched_clock_local
(...)

Also looked to be the same results connecting to the running thread.

@SeanTAllen
Copy link
Member

I think a breakpoint inside of sweep_small that can be used to inspect what is going on once this starts happening is needed. do you feel up to that exploration @patternspandemic?

@SeanTAllen
Copy link
Member

Also, out of curiousity, does this happen if you compile your program in debug mode ponyc --debug?

@patternspandemic
Copy link
Contributor Author

patternspandemic commented Jun 30, 2017

Also, out of curiousity, does this happen if you compile your program in debug mode ponyc --debug?

The program uses Debug to output it's text, but same results with or without -d.

I think a breakpoint inside of sweep_small that can be used to inspect what is going on once this starts happening is needed.

I was thinking the same. I have very little experience with debugging C/C++. I assume I'll need to compile ponyc with config=debug and compile my example with that pony -d and then run the example with gdb/lldb? I'm always up for learning something new.

@SeanTAllen
Copy link
Member

@patternspandemic to really get a good sense of what is going on you want to build the runtime as debug.

build your ponyc from source using: make install config=debug.

then ideally, yes also use --debug when building your application.

there's an lldb cheatsheet that should help with lldb (and a little for gdb as well) as long as your are familiar with stepwise debugging.

give that a go and let me know if you get stuck and i can help out some more.

@patternspandemic
Copy link
Contributor Author

Alright, I've been able to walk the thread spinning in sweep_small, and determined a couple things:

  • The hang up is actually in final_small_freed, which is inlined into sweep_small
  • Stepping shows endless loop between 109 and 111
  • Noticed this function's been updated since release of 0.14.0, and found Fix a bug in finaliser handling #1908 via blame, and sounds alot like what's going on.

Looks like @sylvanc solved this one short after release of 0.14.0

At least I learned how to use LLDB. Thanks for your help @SeanTAllen

@SeanTAllen
Copy link
Member

Awesome. We'll be releasing 0.15.0 soon.

Can you run with the latest source and verify the problem is fixed for you?

@patternspandemic
Copy link
Contributor Author

Indeed a compilation of the example with master solves the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants