Skip to content

Commit

Permalink
add probes for GC phases
Browse files Browse the repository at this point in the history
  • Loading branch information
vchuravy committed Jul 29, 2021
1 parent 86ad7c4 commit 66b40ad
Show file tree
Hide file tree
Showing 8 changed files with 299 additions and 1 deletion.
44 changes: 44 additions & 0 deletions contrib/bpftrace/gc_all.bt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bpftrace

BEGIN
{
printf("Tracing Julia GC Times... Hit Ctrl-C to end.\n");
}

usdt:usr/lib/libjulia-internal.so:julia:gc__begin
{
$now = nsecs;
@time[pid] = $now;
@start[pid] = $now;
}

usdt:usr/lib/libjulia-internal.so:julia:gc__stop_the_world
/@start[pid]/
{
$now = nsecs;
@stop_the_world_usecs[pid] = hist(($now - @time[pid]) / 1000);
@time[pid] = $now;
}

usdt:usr/lib/libjulia-internal.so:julia:gc__end
/@start[pid]/
{
$now = nsecs;
@gc_total_usecs[pid] = hist(($now - @start[pid]) / 1000);
@gc_phase_usecs[pid] = hist(($now - @time[pid]) / 1000);
@time[pid] = $now;
delete(@start[pid]);
}

usdt:usr/lib/libjulia-internal.so:julia:gc__finalizer
/@time[pid]/
{
@finalizer[pid] = hist((nsecs - @time[pid]) / 1000);
delete(@time[pid]);
}

END
{
clear(@start);
clear(@time);
}
23 changes: 23 additions & 0 deletions contrib/bpftrace/gc_simple.bt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bpftrace

BEGIN
{
printf("Tracing Julia GC Times... Hit Ctrl-C to end.\n");
}

usdt:usr/lib/libjulia-internal.so:julia:gc__begin
{
@start[pid] = nsecs;
}

usdt:usr/lib/libjulia-internal.so:julia:gc__end
/@start[pid]/
{
@usecs[pid] = hist((nsecs - @start[pid]) / 1000);
delete(@start[pid]);
}

END
{
clear(@start);
}
23 changes: 23 additions & 0 deletions contrib/bpftrace/gc_stop_the_world_latency.bt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bpftrace

BEGIN
{
printf("Tracing Julia GC Stop-The-World Latency... Hit Ctrl-C to end.\n");
}

usdt:usr/lib/libjulia-internal.so:julia:gc__begin
{
@start[pid] = nsecs;
}

usdt:usr/lib/libjulia-internal.so:julia:gc__stop_the_world
/@start[pid]/
{
@usecs[pid] = hist((nsecs - @start[pid]) / 1000);
delete(@start[pid]);
}

END
{
clear(@start);
}
1 change: 1 addition & 0 deletions doc/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ DevDocs = [
"devdocs/debuggingtips.md",
"devdocs/valgrind.md",
"devdocs/sanitizers.md",
"devdocs/probes.md"
]
]

Expand Down
168 changes: 168 additions & 0 deletions doc/src/devdocs/probes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Instrumenting Julia with DTrace, and bpftrace

DTrace and bpftrace are tools that enable lightweight instrumentation of processes.
You can turn the instrumentation on and off while the process is running,
and with instrumentation off the overhead is minimal.

!!! compat "Julia 1.8"
Support for probes was added in Julia 1.8

!!! note
This documentation has been written from a Linux perspective, most of this
should hold on Mac OS/Darwin and FreeBSD.

## Enabling support

On Linux install the `systemtap` package that has a version of `dtrace`.

```
WITH_DTRACE=1
```

### Verifying

```
> readelf -n usr/lib/libjulia-internal.so.1
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 57161002f35548772a87418d2385c284ceb3ead8
Displaying notes found in: .note.stapsdt
Owner Data size Description
stapsdt 0x00000029 NT_STAPSDT (SystemTap probe descriptors)
Provider: julia
Name: gc__begin
Location: 0x000000000013213e, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cac
Arguments:
stapsdt 0x00000032 NT_STAPSDT (SystemTap probe descriptors)
Provider: julia
Name: gc__stop_the_world
Location: 0x0000000000132144, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cae
Arguments:
stapsdt 0x00000027 NT_STAPSDT (SystemTap probe descriptors)
Provider: julia
Name: gc__end
Location: 0x000000000013214a, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cb0
Arguments:
stapsdt 0x0000002d NT_STAPSDT (SystemTap probe descriptors)
Provider: julia
Name: gc__finalizer
Location: 0x0000000000132150, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cb2
Arguments:
```

## Adding probes in libjulia

Probes are declared in dtraces format in the file `src/uprobes.d`. The generated
header file is included in `src/julia_internal.h` and if you add probes you should
provide a noop implementation there.

The header will contain a semaphore `*_ENABLED` and the actual call to the probe.
If the probe arguments are expensive to compute you should first check if the
probe is enabled and then compute the arguments and call the probe.

```c
if (JL_PROBE_{PROBE}_ENABLED())
auto expensive_arg = ...;
JL_PROBE_{PROBE}(expensive_arg);
```
If your probe has no arguments it is preferred to not include the semaphore check.
With USDT probes enabled the cost of a semaphore is a memory load, irrespective of
the fact that the probe is enabled or not.
```c
#define JL_PROBE_GC_BEGIN_ENABLED() __builtin_expect (julia_gc__begin_semaphore, 0)
__extension__ extern unsigned short julia_gc__begin_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
```

Whereas the probe itself is a noop sled that will be patched to a trampoline to
the probe handler.

## Available probes

### GC probes

1. `julia:gc__begin`: GC begins running on one thread and triggers stop-the-world.
2. `julia:gc__stop_the_world`: All threads have reached a safepoint and GC runs.
3. `julia:gc__mark__begin`: Beginning the mark phase
4. `julia:gc__mark_end(scanned_bytes, perm_scanned)`: Mark phase ended
5. `julia:gc__sweep_begin(full)`: Starting sweep
6. `julia:gc__sweep_end()`: Sweep phase finished
7. `julia:gc__end`: GC is finished, other threads continue work
8. `julia:gc__finalizer`: Initial GC thread has finished running finalizers

#### GC stop-the-world latency

An example `bpftrace` script is given in `contrib/gc_stop_the_world_latency.bt`
and it creates a histogram of the latency for all threads to reach a safepoint.

Running this Julia code, with `julia -t 2`

```
using Base.Threads
fib(x) = x <= 1 ? 1 : fib(x-1) + fib(x-2)
beaver = @spawn begin
while true
fib(30)
# This safepoint is necessary until #41616, since otherwise this
# loop will never yield to GC.
GC.safepoint()
end
end
allocator = @spawn begin
while true
zeros(1024)
end
end
wait(allocator)
```

and in a second terminal

```
> sudo contrib/bpftrace/gc_stop_the_world_latency.bt
Attaching 4 probes...
Tracing Julia GC Stop-The-World Latency... Hit Ctrl-C to end.
^C
@usecs[1743412]:
[4, 8) 971 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 837 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16, 32) 129 |@@@@@@ |
[32, 64) 10 | |
[64, 128) 1 | |
```

We can see the latency distribution of the stop-the-world phase in the executed Julia process.

## Notes on using `bpftrace`

An example probe in the bpftrace format looks like:

```
usdt:usr/lib/libjulia-internal.so:julia:gc__begin
{
@start[pid] = nsecs;
}
```

The probe declaration takes the kind `usdt`, then either the
path to the library or the PID, the provider name `julia`
and the probe name `gc__begin`. Note that I am using a
relative path to the `libjulia-internal.so`, but this might
need to be an absolute path on a production system.

## Useful references:

- [Julia Evans blog on Linux tracing systems](https://jvns.ca/blog/2017/07/05/linux-tracing-systems)
- [LWN article on USDT and BPF](https://lwn.net/Articles/753601/)
- [GDB support for probes](https://sourceware.org/gdb/onlinedocs/gdb/Static-Probe-Points.html)
- [Brendan Gregg -- Linux Performance](https://www.brendangregg.com/linuxperf.html)
11 changes: 11 additions & 0 deletions src/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -3004,6 +3004,7 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)

uint64_t t0 = jl_hrtime();
int64_t last_perm_scanned_bytes = perm_scanned_bytes;
JL_PROBE_GC_MARK_BEGIN();

// 1. fix GC bits of objects in the remset.
for (int t_i = 0; t_i < jl_n_threads; t_i++)
Expand All @@ -3030,6 +3031,7 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
gc_mark_loop(ptls, sp);
gc_mark_sp_init(gc_cache, &sp);
gc_num.since_sweep += gc_num.allocd;
JL_PROBE_GC_MARK_END(scanned_bytes, perm_scanned_bytes);
gc_settime_premark_end();
gc_time_mark_pause(t0, scanned_bytes, perm_scanned_bytes);
int64_t actual_allocd = gc_num.since_sweep;
Expand Down Expand Up @@ -3135,6 +3137,7 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
perm_scanned_bytes = 0;
scanned_bytes = 0;
// 5. start sweeping
JL_PROBE_GC_SWEEP_BEGIN(sweep_full);
sweep_weak_refs();
sweep_stack_pools();
gc_sweep_foreign_objs();
Expand All @@ -3144,6 +3147,7 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
gc_sweep_pool(sweep_full);
if (sweep_full)
gc_sweep_perm_alloc();
JL_PROBE_GC_SWEEP_END();
// sweeping is over
// 6. if it is a quick sweep, put back the remembered objects in queued state
// so that we don't trigger the barrier again on them.
Expand Down Expand Up @@ -3201,6 +3205,8 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)

JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
{
JL_PROBE_GC_BEGIN(collection);

jl_task_t *ct = jl_current_task;
jl_ptls_t ptls = ct->ptls;
if (jl_gc_disable_counter) {
Expand Down Expand Up @@ -3230,6 +3236,8 @@ JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
// TODO (concurrently queue objects)
// no-op for non-threading
jl_gc_wait_for_the_world();
JL_PROBE_GC_STOP_THE_WORLD();

gc_invoke_callbacks(jl_gc_cb_pre_gc_t,
gc_cblist_pre_gc, (collection));

Expand All @@ -3247,6 +3255,7 @@ JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
// no-op for non-threading
jl_safepoint_end_gc();
jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
JL_PROBE_GC_END();

// Only disable finalizers on current thread
// Doing this on all threads is racy (it's impossible to check
Expand All @@ -3257,6 +3266,8 @@ JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
run_finalizers(ct);
ptls->in_finalizer = was_in_finalizer;
}
JL_PROBE_GC_FINALIZER();

gc_invoke_callbacks(jl_gc_cb_post_gc_t,
gc_cblist_post_gc, (collection));
#ifdef _OS_WINDOWS_
Expand Down
22 changes: 21 additions & 1 deletion src/julia_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1457,10 +1457,30 @@ uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
//
// If the arguments to `JL_PROBE_{PROBE}` are expensive to compute, the call to
// these functions must be guarded by a JL_PROBE_{PROBE}_ENABLED() check, to
// minimize performance impact when probing is off.
// minimize performance impact when probing is off. As an example:
//
// if (JL_PROBE_GC_STOP_THE_WORLD_ENABLED())
// JL_PROBE_GC_STOP_THE_WORLD();

#else
// define a dummy version of the probe functions
#define JL_PROBE_GC_BEGIN(collection) do ; while (0)
#define JL_PROBE_GC_STOP_THE_WORLD() do ; while (0)
#define JL_PROBE_GC_MARK_BEGIN() do ; while (0)
#define JL_PROBE_GC_MARK_END(scanned_bytes, perm_scanned_bytes) do ; while (0)
#define JL_PROBE_GC_SWEEP_BEGIN(full) do ; while (0)
#define JL_PROBE_GC_SWEEP_END() do ; while (0)
#define JL_PROBE_GC_END() do ; while (0)
#define JL_PROBE_GC_FINALIZER() do ; while (0)

#define JL_PROBE_GC_BEGIN_ENABLED() (0)
#define JL_PROBE_GC_STOP_THE_WORLD_ENABLED() (0)
#define JL_PROBE_GC_MARK_BEGIN_ENABLED() (0)
#define JL_PROBE_GC_MARK_END_ENABLED() (0)
#define JL_PROBE_GC_SWEEP_BEGIN_ENABLED() (0)
#define JL_PROBE_GC_SWEEP_END_ENABLED() (0)
#define JL_PROBE_GC_END_ENABLED() (0)
#define JL_PROBE_GC_FINALIZER_ENABLED() (0)
#endif

#endif
8 changes: 8 additions & 0 deletions src/uprobes.d
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
/* Julia DTrace provider */

provider julia {
probe gc__begin(int collection)
probe gc__stop_the_world()
probe gc__mark__begin()
probe gc__mark__end(int64 scanned_bytes, int64 perm_scanned_bytes)
probe gc__sweep__begin(int full)
probe gc__sweep__end()
probe gc__end()
probe gc__finalizer()
}

#pragma D attributes Evolving/Evolving/Common provider julia provider
Expand Down

0 comments on commit 66b40ad

Please sign in to comment.