Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uftrace for go language #557

Open
honggyukim opened this issue Oct 20, 2018 · 11 comments
Open

uftrace for go language #557

honggyukim opened this issue Oct 20, 2018 · 11 comments

Comments

@honggyukim
Copy link
Collaborator

I've been testing uftrace for go language after talking to @taeguk last week.

1. Test program
Here is a simple abc.go program.

$ cat abc.go
package main

import "fmt"
import "os"

func c() int { return os.Getpid() }
func b() int { return c() + 10 }
func a() int { return b() +  1 }

func main() {
    pid := a()
    fmt.Println(pid)
}

2. Build and run
After building it with -pg option using gccgo, it runs same as normal execution.

$ gccgo -pg abc.go
$ ./a.out
108150

3. Run it with uftrace
Now I executed it with uftrace and found that it shows many error messages. However, it also shows the trace output which looks as expected from main.main().

a.out: /home/honggyu/work/uftrace/libmcount/plthook.c:825: plthook_exit: Assertion `mtdp != NULL' failed.
Aborted

runtime_sighandler
        ../../../src/libgo/runtime/go-signal.c:223
sig_tramp_info
        ../../../src/libgo/runtime/go-signal.c:413

        :0

        :0

        :0

        :0

        :0
plthook_exit
        /home/honggyu/work/uftrace/libmcount/plthook.c:825

        /home/honggyu/work/uftrace/arch/x86_64/plthook.S:85

        /home/honggyu/work/uftrace/arch/x86_64/plthook.S:68

        :0

goroutine 17 [syscall]:
        goroutine in C code; stack unavailable
created by runtime_main
        ../../../src/libgo/runtime/proc.c:598

goroutine 18 [finalizer wait]:
runtime_mcall
        ../../../src/libgo/runtime/proc.c:295
runfinq
        ../../../src/libgo/runtime/mgc0.c:2512
kickoff
        ../../../src/libgo/runtime/proc.c:235
created by runtime_createfing
        ../../../src/libgo/runtime/mgc0.c:2577

goroutine 19 [runnable]:
runtime_mcall
        ../../../src/libgo/runtime/proc.c:295
bgsweep
        ../../../src/libgo/runtime/mgc0.c:1833
kickoff
        ../../../src/libgo/runtime/proc.c:235
created by mgc
        ../../../src/libgo/runtime/mgc0.c:2215
# DURATION     TID     FUNCTION
   3.040 us [108202] | mmap();
   1.820 us [108202] | mprotect();
   1.350 us [108202] | getpagesize();
   2.200 us [108202] | munmap();
   1.087 us [108202] | sigemptyset();
   1.017 us [108202] | sigfillset();
   1.051 us [108202] | sigdelset();
   0.150 us [108202] | sigdelset();
  15.954 us [108202] | runtime_check();
   3.950 us [108202] | runtime_args();
  19.017 us [108202] | runtime_osinit();
            [108202] | runtime_schedinit() {
   0.877 us [108202] |   getpagesize();
   2.843 us [108202] |   getenv();
   3.013 us [108202] |   mmap();
 117.324 ms [108202] | } /* runtime_schedinit */
            [108202] | __go_go() {
   3.204 us [108202] |   mmap();
  40.581 us [108202] | } /* __go_go */
   1.093 us [108202] | runtime_m();
            [108202] | runtime_mstart() {
   3.687 us [108202] |   mmap();
   3.280 us [108202] |   mmap();
            [108202] |   __go_init_main() {
  12.207 us [108202] |     math..import();
   1.634 us [108202] |     runtime..import();
 116.052 us [108202] |     strconv..import();
   4.054 us [108202] |     sync..import();
   3.310 us [108202] |     io..import();
  62.922 us [108202] |     reflect..import();
  17.433 us [108202] |     syscall..import();
 150.791 us [108202] |     time..import();
            [108202] |     os..import() {
   4.400 us [108202] |       mmap();
  41.808 us [108202] |     } /* os..import */
  19.316 us [108202] |     fmt..import();
 434.940 us [108202] |   } /* __go_init_main */
            [108202] |   main.main() {
            [108202] |     main.a() {
            [108202] |       main.b() {
            [108202] |         main.c() {
   3.150 us [108202] |           os.Getpid();
   3.983 us [108202] |         } /* main.c */
   4.560 us [108202] |       } /* main.b */
   4.997 us [108202] |     } /* main.a */
            [108202] |     __go_new() {
   2.277 us [108202] |       mmap();
   2.710 us [108202] |       mmap();
   4.235 ms [108202] |       /* linux:schedule */
            [108202] |       /* linux:task-exit */

uftrace stopped tracing with remaining functions
================================================
task: 108202
[2] __go_new
[1] main.main
[0] runtime_mstart

4. Problem
I found that mtdp got null from get_thread_data() in multi-threaded mode.

#ifdef SINGLE_THREAD
# define TLS
# define get_thread_data()  &mtd
# define check_thread_data(mtdp)  (mtdp->rstack == NULL)
#else
# define TLS  __thread
# define get_thread_data()  pthread_getspecific(mtd_key)
# define check_thread_data(mtdp)  (mtdp == NULL)
#endif

5. Bypassing the problem
There is an option to force to use libmcount-single.so that is implemented as a single threaded mode, so I used it. As shown in the below trace output, uftrace doesn't complain with --libmcount-single now and the trace output looks fine.

$ uftrace --libmcount-single a.out
108400
# DURATION     TID     FUNCTION
   2.547 us [108389] | mmap();
   1.924 us [108389] | mprotect();
   1.383 us [108389] | getpagesize();
   2.157 us [108389] | munmap();
   1.217 us [108389] | sigemptyset();
   0.974 us [108389] | sigfillset();
   1.104 us [108389] | sigdelset();
   0.157 us [108389] | sigdelset();
  31.521 us [108389] | runtime_check();
   3.820 us [108389] | runtime_args();
  13.130 us [108389] | runtime_osinit();
            [108389] | runtime_schedinit() {
   0.677 us [108389] |   getpagesize();
   2.007 us [108389] |   getenv();
   2.574 us [108389] |   mmap();
 109.723 ms [108389] | } /* runtime_schedinit */
            [108389] | __go_go() {
   3.086 us [108389] |   mmap();
  24.324 us [108389] | } /* __go_go */
   1.013 us [108389] | runtime_m();
            [108389] | runtime_mstart() {
   2.370 us [108389] |   mmap();
   3.343 us [108389] |   mmap();
            [108389] |   __go_init_main() {
  10.720 us [108389] |     math..import();
   1.223 us [108389] |     runtime..import();
 108.929 us [108389] |     strconv..import();
   8.073 us [108389] |     sync..import();
   8.370 us [108389] |     io..import();
  40.111 us [108389] |     reflect..import();
  15.961 us [108389] |     syscall..import();
   1.713 ms [108389] |     time..import();
            [108389] |     os..import() {
   3.307 us [108389] |       mmap();
  46.491 us [108389] |     } /* os..import */
  15.957 us [108389] |     fmt..import();
   1.973 ms [108389] |   } /* __go_init_main */
            [108389] |   main.main() {
            [108389] |     main.a() {
            [108389] |       main.b() {
            [108389] |         main.c() {
   2.850 us [108389] |           os.Getpid();
   3.667 us [108389] |         } /* main.c */
   4.133 us [108389] |       } /* main.b */
   4.610 us [108389] |     } /* main.a */
            [108389] |     __go_new() {
   2.310 us [108389] |       mmap();
   2.644 us [108389] |       mmap();
            [108389] |       /* linux:sched-out */
  43.081 us [108389] |     } /* __go_new */
   1.687 us [108389] |     __go_new();
 124.456 us [108389] |     fmt.Println();
 990.341 us [108389] |   } /* main.main */
   1.250 ms [108389] |   /* linux:sched-in */
            [108389] |   /* linux:task-exit */

uftrace stopped tracing with remaining functions
================================================
task: 108389
[0] runtime_mstart

6. Simplified trace output only for main
Since go program calls many library calls before main function, I used --no-libcall this time.

$ uftrace --libmcount-single --no-libcall a.out
108438
# DURATION     TID     FUNCTION
 409.674 us [108427] | __go_init_main();
            [108427] | main.main() {
            [108427] |   main.a() {
            [108427] |     main.b() {
   2.756 us [108427] |       main.c();
   3.767 us [108427] |     } /* main.b */
   4.357 us [108427] |   } /* main.a */
            [108427] |   /* linux:sched-out */
 117.486 us [108427] | } /* main.main */

The output almost looks what I expected from uftrace.

Thanks @taeguk for giving us the idea!

@honggyukim
Copy link
Collaborator Author

honggyukim commented Oct 20, 2018

It can be filtered with main.main as well and I found that --auto-args works fine if -g option is given when compiling.

$ gccgo -pg -g -o abc abc.go

$ uftrace --libmcount-single -F main.main -a abc
108851
# DURATION     TID     FUNCTION
            [108840] | main.main() {
            [108840] |   main.a() {
            [108840] |     main.b() {
            [108840] |       main.c() {
  10.510 us [108840] |         os.Getpid();
  13.463 us [108840] |       } = 0x1a928; /* main.c */
  18.733 us [108840] |     } = 0x1a932; /* main.b */
  19.961 us [108840] |   } = 0x1a933; /* main.a */
            [108840] |   __go_new() {
   4.007 us [108840] |     mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f5284310000;
   3.117 us [108840] |     mmap(0, 36864, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f5286fd5000;
            [108840] |     /* linux:sched-out */
   4.440 us [108840] |   } /* __go_new */
   3.843 us [108840] |   __go_new();
 188.786 us [108840] |   fmt.Println();
   1.106 ms [108840] | } /* main.main */

@honggyukim
Copy link
Collaborator Author

Here is more verbose output.

$ uftrace --libmcount-single -a abc
109929
# DURATION     TID     FUNCTION
   4.387 us [109918] | mmap(0, 0, PROT_READ, MAP_ANON, -1, 0) = 0xffffffffffffffff;
   2.316 us [109918] | mprotect(0, 0, PROT_NONE) = 0;
   1.436 us [109918] | getpagesize();
   2.430 us [109918] | munmap(0, 4096) = 0;
   1.406 us [109918] | sigemptyset(0x7f89ccd13848) = 0;
   1.180 us [109918] | sigfillset(0x605220) = 0;
   1.337 us [109918] | sigdelset(0x605220, SIGRTMIN) = 0;
   0.280 us [109918] | sigdelset(0x605220, SIGRTMIN|SIGHUP) = 0;
   3.967 us [109918] | runtime_args();
   7.520 us [109918] | runtime_osinit();
            [109918] | runtime_schedinit() {
   0.727 us [109918] |   getpagesize();
 462.787 us [109918] |   getenv("SPLIT_STACK_GUARD") = "NULL";
   4.347 us [109918] |   mmap(0, 36864, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c92f6000;
 110.338 ms [109918] | } /* runtime_schedinit */
            [109918] | __go_go() {
   3.757 us [109918] |   mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c7b19000;
  26.757 us [109918] | } /* __go_go */
   1.100 us [109918] | runtime_m();
            [109918] | runtime_mstart() {
   2.790 us [109918] |   mmap(0, 36864, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c92ed000;
   3.741 us [109918] |   mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c6f17000;
            [109918] |   __go_init_main() {
  10.844 us [109918] |     math..import();
   1.343 us [109918] |     runtime..import();
 108.169 us [109918] |     strconv..import();
   7.693 us [109918] |     sync..import();
   2.444 us [109918] |     io..import();
  39.790 us [109918] |     reflect..import();
  15.847 us [109918] |     syscall..import();
 130.329 us [109918] |     time..import();
            [109918] |     os..import() {
   4.214 us [109918] |       mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c6b16000;
  48.111 us [109918] |     } /* os..import */
  16.047 us [109918] |     fmt..import();
 388.699 us [109918] |   } /* __go_init_main */
            [109918] |   main.main() {
            [109918] |     main.a() {
            [109918] |       main.b() {
            [109918] |         main.c() {
   2.746 us [109918] |           os.Getpid();
   3.917 us [109918] |         } = 0x1ad5e; /* main.c */
   8.437 us [109918] |       } = 0x1ad68; /* main.b */
   9.236 us [109918] |     } = 0x1ad69; /* main.a */
            [109918] |     __go_new() {
   2.667 us [109918] |       mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c6615000;
   2.697 us [109918] |       mmap(0, 36864, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7f89c92da000;
 870.057 us [109918] |     } /* __go_new */
            [109918] |     /* linux:sched-out */
   1.823 us [109918] |     __go_new();
 126.585 us [109918] |     fmt.Println();
 130.595 us [109918] |   } /* main.main */
   5.595 ms [109918] |   /* linux:sched-in */
            [109918] |   /* linux:task-exit */

uftrace stopped tracing with remaining functions
================================================
task: 109918
[0] runtime_mstart

@ParkHanbum
Copy link
Contributor

@honggyukim GOOOOOOOD!!

@taeguk
Copy link
Contributor

taeguk commented Oct 20, 2018

@honggyukim Great!!!

@namhyung
Copy link
Owner

Nice, I'll take a look later.

@honggyukim
Copy link
Collaborator Author

honggyukim commented Oct 25, 2018

It has to support multi-threaded go programs so it must work without --libmcount-single option anyway. It needs to be observed more later.

By the way, I tested it with other go examples that include go routines and channels, and found that they both work fine.

@namhyung
Copy link
Owner

I suspect depth in goroutine will be mixed(?) and confusing. Could you please show me an example?

@honggyukim
Copy link
Collaborator Author

$ cat go-routines.go
package main
import "fmt"

func f(from string) {
    fmt.Println(from)
}

func main() {
    f("direct")
    go f("goroutine")
    fmt.Scanln()
    fmt.Println("done")
}
$ gccgo -pg go-routines.go

$ uftrace -F main.main --libmcount-single a.out
direct
goroutine
done
# DURATION     TID     FUNCTION
            [ 84355] | main.main() {
            [ 84355] |   main.f() {
            [ 84355] |     __go_new() {
   2.860 us [ 84355] |       mmap();
   2.383 us [ 84355] |       mmap();
            [ 84355] |       /* linux:sched-out */
  84.293 us [ 84355] |     } /* __go_new */
   3.134 us [ 84355] |     __go_new();
 198.483 us [ 84355] |     fmt.Println();
   1.151 ms [ 84355] |   } /* main.f */
   0.194 us [ 84355] |   __go_new();
            [ 84355] |   __go_go() {
   7.900 us [ 84355] |     mmap();
  23.427 us [ 84355] |   } /* __go_go */
            [ 84355] |   fmt.Scanln() {
   7.470 us [ 84355] |     mmap();
            [ 84355] |     main.$thunk0() {
            [ 84355] |       main.f() {
   1.310 us [ 84355] |         __go_new();
   0.346 us [ 84355] |         __go_new();
            [ 84355] |         fmt.Println() {
   4.857 us [ 84355] |           mmap();
 243.930 us [ 84355] |         } /* fmt.Println */
 249.130 us [ 84355] |       } /* main.f */
 250.207 us [ 84355] |     } /* main.$thunk0 */
  46.966 ms [ 84355] |   } /* fmt.Scanln */
   1.734 us [ 84355] |   __go_new();
   0.296 us [ 84355] |   __go_new();
  16.064 us [ 84355] |   fmt.Println();
  48.164 ms [ 84355] | } /* main.main */

It shows that main.f is inside main.$thunk0. I know that its depth is inherited from its parent, but I seems to be okay because we can correctly see who its parent is IMHO.

@honggyukim
Copy link
Collaborator Author

honggyukim commented Oct 25, 2018

Here is an example for using channels.

$ cat channels.go
package main
import "fmt"

func main() {
    messages := make(chan string)

    go func() {
        messages <- "ping"
    }()

    msg := <-messages
    fmt.Println(msg)
}
$ gccgo -pg channels.go

$ uftrace -F main.main --libmcount-single a.out
ping
# DURATION     TID     FUNCTION
            [ 84601] | main.main() {
            [ 84601] |   __go_new() {
   2.300 us [ 84601] |     mmap();
   2.260 us [ 84601] |     mmap();
            [ 84601] |     /* linux:sched-out */
  11.820 us [ 84601] |   } /* __go_new */
  11.716 us [ 84601] |   __go_new_channel();
   0.783 us [ 84601] |   __go_new();
  22.857 us [ 84601] |   __go_new();
            [ 84601] |   __go_go() {
   8.030 us [ 84601] |     mmap();
  32.617 us [ 84601] |   } /* __go_go */
            [ 84601] |   __go_receive() {
            [ 84601] |     main.$thunk0() {
            [ 84601] |       main.$nested0() {
   3.410 us [ 84601] |         __go_send_big();
   4.460 us [ 84601] |       } /* main.$nested0 */
   5.150 us [ 84601] |     } /* main.$thunk0 */
  20.870 us [ 84601] |   } /* __go_receive */
   0.530 us [ 84601] |   __go_new();
   0.380 us [ 84601] |   __go_new();
            [ 84601] |   fmt.Println() {
   7.177 us [ 84601] |     mmap();
 521.878 us [ 84601] |   } /* fmt.Println */
   1.527 ms [ 84601] | } /* main.main */

It shows that __go_new_channel, __go_receive and __go_send_big are related to go channel execution. It also shows main.$thunk0 for running a go routine.

@honggyukim
Copy link
Collaborator Author

honggyukim commented Oct 25, 2018

Hmm.. I tested with more complicated goroutine example from https://gobyexample.com/stateful-goroutines and found that it's too deeply nested because of multiple go routine instances.

$ gccgo -pg -g stateful-goroutines.go

$ uftrace -a --libmcount-single a.out
WARN: too deeply nested calls: 1024
readOps: 56147
writeOps: 6902
unexpected fault address 0xfffffea085894907
fatal error: fault
[signal 0xb code=0x1 addr=0xfffffea085894907]

goroutine 16 [running]:
runtime_dopanic
        ../../../src/libgo/runtime/panic.c:131
runtime_throw
        ../../../src/libgo/runtime/panic.c:193
sig_panic_info_handler
        ../../../src/libgo/runtime/go-signal.c:294
        ...
            [ 86422] |   main.main() {
            [ 86422] |     __go_new() {
   2.733 us [ 86422] |       mmap(0, 0x401000, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7fea1e337000;
   3.117 us [ 86422] |       mmap(0, 36864, PROT_WRITE|PROT_READ, MAP_ANON|MAP_PRIVATE, -1, 0) = 0x7fea20ff6000;
            [ 86422] |       /* linux:sched-out */
  50.394 us [ 86422] |     } /* __go_new */
        ...
            [ 86422] |     __go_new() {
            [ 86422] |       main.$thunk0(0xc208032018) {
            [ 86422] |         main.$nested0() {
   9.000 us [ 86422] |           __go_new_map();
  46.697 us [ 86422] |           runtime.newselect();
   3.210 us [ 86422] |           runtime.selectrecv();
   0.260 us [ 86422] |           runtime.selectrecv();
            [ 86422] |           runtime.selectgo() {
            [ 86422] |             main.$thunk1(0xc208032028) {
            [ 86422] |               main.$nested1() {
  11.616 us [ 86422] |                 math_rand.Intn();
   1.043 us [ 86422] |                 __go_new_channel();
   0.583 us [ 86422] |                 __go_new();
   3.853 us [ 86422] |                 __go_send_small();
            [ 86422] |                 __go_receive() {
            [ 86422] |                   main.$thunk1(0xc208032038) {
            [ 86422] |                     main.$nested1() {
   0.663 us [ 86422] |                       math_rand.Intn();
   0.450 us [ 86422] |                       __go_new_channel();
   0.426 us [ 86422] |                       __go_new();
            [ 86422] |                       __go_send_small() {
            [ 86422] |                         main.$thunk1(0xc208032040) {
            [ 86422] |                           main.$nested1() {
   0.330 us [ 86422] |                             math_rand.Intn();
   0.410 us [ 86422] |                             __go_new_channel();
   0.327 us [ 86422] |                             __go_new();
            [ 86422] |                             __go_send_small() {
            [ 86422] |                               main.$thunk1(0xc208032048) {
            [ 86422] |                                 main.$nested1() {
   0.293 us [ 86422] |                                   math_rand.Intn();
        ...

It seems that we need to handle go routines somehow differently.

@namhyung
Copy link
Owner

Yes that what I worry about. We somehow need to figure out goroutine contexts and treat them as different threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants