Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange mapping at 0 on Mach-O #409

Open
Timmmm opened this issue Aug 14, 2018 · 14 comments
Open

Strange mapping at 0 on Mach-O #409

Timmmm opened this issue Aug 14, 2018 · 14 comments
Labels
Priority: p3 Buganizer priority - P3 type: bug Buganizer type - Bug

Comments

@Timmmm
Copy link
Contributor

Timmmm commented Aug 14, 2018

This is with master as of now on OSX. If you create a test as described here. The code will go to symbolizer.go newMapping(). It will first call obj.Open("myexe", 0x100000000, ...) which is correct because the __TEXT segment is mapped there. However it then also tries to call obj.Open("myexe", 0, ... which fails because the code assumes you are trying to symbolise things in the __TEXT segment. As far as I can see the only thing mapped to 0 is the __PAGEZERO segment.

I'm not really sure what is going on here, but something isn't right. It also doesn't seem to cause any obvious issues but maybe I am overlooking something.

@aalexand
Copy link
Collaborator

Maybe that call is from

if file, err := o.Obj.Open(arg0, 0, ^uint64(0), 0); err == nil {
where it sniffs the first argument to decide whether it's a binary or a profile?

@aalexand
Copy link
Collaborator

If so, perhaps it might make sense to have a separate method to call like o.Obj.Sniff that would cheaply return a boolean whether the file looks like a supported binary instead of actually trying to open it.

@Timmmm
Copy link
Contributor Author

Timmmm commented Aug 14, 2018

It definitely isn't that - it comes from newMapping(). The error is:

Local symbolization failed for resnet: error reading Mach-O file /path/to/./exe: could not identify base for /path/to/./exe: __TEXT segment address (0x100000000) > mapping start address (0x0)

That message is only generated there. Also I added extra printf debugging and it is definitely an entry in prof.Mapping.

I guess the cli call also generates this error but it doesn't print it, so I think that's fine. And yeah I agree it should use magic numbers rather than trying to completely open the file.

@aalexand
Copy link
Collaborator

My other guess would be that it could be from

if len(p.Mapping) == 0 {
.

@Timmmm
Copy link
Contributor Author

Timmmm commented Aug 15, 2018

I added a printf there to check and it doesn't seem to be. Makes sense because there is another mapping in that file. I guess this could be a problem with gperftools itself?

@aalexand
Copy link
Collaborator

Could be. The profiles generated by gperftools should be in text format, can you take a look at what the mapping record section looks like in the input file?

@Timmmm
Copy link
Contributor Author

Timmmm commented Aug 15, 2018

My profile is in binary format. :-/

@aalexand
Copy link
Collaborator

aalexand commented Aug 16, 2018

What do the mappings look like in pprof -raw <path-to-profile-file>?

@Timmmm
Copy link
Contributor Author

Timmmm commented Aug 17, 2018

Aha!

Mappings
1: 0x7fffa519d000/0x7fffac3ac000/0x200b9000 /usr/lib/libc++abi.dylib  
2: 0x104855000/0x1048e2000/0x0 /path/to/a.dylib  [FN][FL][IN]
3: 0x0/0xffffffffffffffff/0x0   
4: 0x10496a000/0x104972000/0x0 /path/to/a.dylib  [FN][FL][IN]
5: 0x10001e000/0x100091000/0x0 /path/to/a.dylib  [FN][FL][IN]
6: 0x100000000/0x10000d000/0x0 /path/to/myprogram [FN][FL][IN]
7: 0x7fff97c66000/0x7fff97c6c000/0x0 /usr/lib/system/libdyld.dylib  
8: 0x0/0x100000000/0x0 /path/to/myprogram 
9: 0x100120000/0x102c12000/0x0 /path/to/a.dylib  [FN][FL][IN]
10: 0x7fff97e78000/0x7fff97e81000/0x0 /usr/lib/system/libsystem_platform.dylib  
11: 0x104a39000/0x104a49000/0x0 /path/to/a.dylib  [FN][FL][IN]
12: 0x7fff97deb000/0x7fff97e0a000/0x0 /usr/lib/system/libsystem_malloc.dylib  
13: 0x7fff96853000/0x7fff9687d000/0x0 /usr/lib/libc++abi.dylib  
14: 0x1047c9000/0x104800000/0x0 /path/to/a.dylib  [FN][FL][IN]
15: 0x10497a000/0x1049e4000/0x0 /path/to/a.dylib  [FN][FL][IN]
16: 0x10482a000/0x10483b000/0x0 /path/to/a.dylib  [FN][FL][IN]
17: 0x104a5f000/0x104a83000/0x0 /path/to/a.dylib  [FN][FL][IN]
18: 0x7fff97e81000/0x7fff97e8c000/0x0 /usr/lib/system/libsystem_pthread.dylib  
19: 0x7fff97d80000/0x7fff97da3000/0x0 /usr/lib/system/libsystem_kernel.dylib  
20: 0x104acb000/0x104af0000/0x12000 /path/to/a.dylib  [FN][FL][IN]
21: 0x104801000/0x10482a000/0x38000 /path/to/a.dylib  [FL][IN]
22: 0x100113000/0x100118000/0x8000 /usr/local/lib/libprofiler.0.dylib  [FL][IN]
23: 0x7fff97ca0000/0x7fff97d2e000/0x0 /usr/lib/system/libsystem_c.dylib  
24: 0x7fff967fc000/0x7fff96853000/0x0 /usr/lib/libc++.1.dylib  
25: 0x10010b000/0x100113000/0x0 /usr/local/lib/libprofiler.0.dylib  [FN][FL][IN]
26: 0x104ab9000/0x104acb000/0x0 /path/to/a.dylib  [FN][FL][IN]
27: 0x7fffa0b7c000/0x7fffa0b7e1ce/0x1fa98000 /usr/lib/system/libsystem_malloc.dylib  
28: 0x1048e5000/0x10496a000/0x90000 /path/to/a.dylib  [FN][FL][IN]
29: 0x100096000/0x10010b000/0x78000 /path/to/a.dylib  [FN][FL][IN]
30: 0x102f51000/0x1047c9000/0x2d57000 /path/to/a.dylib  [FN][FL][IN]
31: 0x104b1c000/0x104b21000/0x16000 /path/to/a.1.dylib  [FN][FL][IN]
32: 0x7fffa0b78000/0x7fffa0b7a950/0x1fa94000 /usr/lib/system/libsystem_kernel.dylib  
33: 0x102c12000/0x102f51000/0x2af2000 /path/to/a.dylib  [FN][FL][IN]

(Had to redact the names sorry; all the a.dylibs are different internal libraries that myprogram links to.)

Anyway the problem must be number 8. What do those [FN][FL][IN] things mean? Perhaps the fact that 8 doesn't have them means it shouldn't be loaded or something? 3 also looks suspicious.

@Timmmm
Copy link
Contributor Author

Timmmm commented Aug 17, 2018

Ah apparently [FN] means hasFunctions, [FL] is hasFilenames and [IN] is hasInlineFrames.

@aalexand
Copy link
Collaborator

@Timmmm Which tool did you use to collect this profile? Is it from gperftools?

@Timmmm
Copy link
Contributor Author

Timmmm commented Sep 3, 2018

Yeah gperftools.

@ayushkarnawat
Copy link

ayushkarnawat commented Sep 14, 2021

I am still getting this same "error" using a relatively new version of pprof (commit b9804c9) on macOS Big Sur (darwin 11.4). Edit: Same result on most recent commit as well. I am also collecting profiling data from gperftools (version 2.8) and symbolizing using llvm-symbolizer (llvm11). Testing on the file @Timmmm provided above, and also on other c/cpp files. Full output:

$ clang++ -Wall -g -O0 -fno-pie -c main.cpp -o main.o
$ clang++ -o main main.o -lprofiler -fno-pie
$ CPUPROFILE=main.prof ./main
k = 1:   2   3   5   7  11  13  17  19  23  29
k = 2:   4   6   9  10  14  15  21  22  25  26
k = 3:   8  12  18  20  27  28  30  42  44  45
k = 4:  16  24  36  40  54  56  60  81  84  88
k = 5:  32  48  72  80 108 112 120 162 168 176
k = 6:  64  96 144 160 216 224 240 324 336 352
k = 7: 128 192 288 320 432 448 480 648 672 704
k = 8: 256 384 576 640 864 896 960129613441408
k = 9: 512 76811521280172817921920259226882816
k = 10:1024153623042560345635843840518453765632
k = 11:2048307246085120691271687680103681075211264
k = 12:40966144921610240138241433615360207362150422528
k = 13:8192122881843220480276482867230720414724300845056
k = 14:16384245763686440960552965734461440829448601690112
k = 15:32768491527372881920110592114688122880165888172032180224
k = 16:6553698304147456163840221184229376245760331776344064360448
k = 17:131072196608294912327680442368458752491520663552688128720896
k = 18:262144393216589824655360884736917504983040132710413762561441792
k = 19:52428878643211796481310720176947218350081966080265420827525122883584
k = 20:1048576157286423592962621440353894436700163932160530841655050245767168
PROFILE: interrupts/evictions/bytes = 812/334/22864
$ PPROF_BINARY_PATH=./main
$ pprof -text -trim=false main.prof
Local symbolization failed for libdyld.dylib: stat /usr/lib/system/libdyld.dylib: no such file or directory
Local symbolization failed for main: error reading Mach-O file /<path/to/root/folder>/./main: could not identify base for /<path/to/root/folder>/./main: __TEXT segment address (0x100000000) > mapping start address (0x0)
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: libc++.1.dylib
Type: cpu
Showing nodes accounting for 8.12s, 100% of 8.12s total
      flat  flat%   sum%        cum   cum%
     7.99s 98.40% 98.40%      7.99s 98.40%  k_prime
     0.03s  0.37% 98.77%      7.14s 87.93%  primes
     0.03s  0.37% 99.14%      0.03s  0.37%  std::__1::__compressed_pair_elem::__get
     0.02s  0.25% 99.38%      0.02s  0.25%  [libsystem_kernel.dylib]
     0.02s  0.25% 99.63%      0.05s  0.62%  std::__1::__list_imp::__sz
     0.02s  0.25% 99.88%      0.05s  0.62%  std::__1::list::size
     0.01s  0.12%   100%      0.01s  0.12%  std::__1::locale::use_facet
         0     0%   100%      7.15s 88.05%  <unknown>
         0     0%   100%      0.03s  0.37%  ProfileHandler::SignalHandler
         0     0%   100%      7.15s 88.05%  [libdyld.dylib]
         0     0%   100%      5.47s 67.36%  [libsystem_platform.dylib]
         0     0%   100%      0.22s  2.71%  [main]
         0     0%   100%      7.15s 88.05%  main
         0     0%   100%      0.03s  0.37%  std::__1::__compressed_pair::first
         0     0%   100%      0.01s  0.12%  std::__1::basic_ostream::operator<<

We can ignore the warning for setting PPROF_BINARY_PATH (note this was already done for the local executable). This warning is just reiterating that we do not have symbol information for some binaries (e.g libdyld.dylib). This occurs in Big Sur because Apple changed the filesystem such that *.dylib files can no longer be searched as files within the system. Regardless, symbolizing those libraries won't really matter as we will likely not be optimizing them anyways.

@Timmmm @aalexand Have you guys figured out if the problem is, in fact, with gperftools, and not pprof? If so, are there other profiling tools that work with pprof that do not result in weird symbolization errors. If not, then I might try attempting to see if this persists using other profiling tools.

@ayushkarnawat
Copy link

ayushkarnawat commented Sep 15, 2021

Ok, by compiling with PIE and ASLR enabled (as suggested in #311 (comment)), I no longer get the warning that symbolization failed on the local binary ./main. This work on both latest HEAD or commit b9804c9 of pprof. However, I am still seeing <unknown>'s in the pprof output:

Local symbolization failed for libdyld.dylib: stat /usr/lib/system/libdyld.dylib: no such file or directory
Local symbolization failed for libsystem_c.dylib: stat /usr/lib/system/libsystem_c.dylib: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: libc++.1.dylib
Type: cpu
Showing nodes accounting for 9.13s, 100% of 9.13s total
      flat  flat%   sum%        cum   cum%
     9.01s 98.69% 98.69%      9.01s 98.69%  k_prime
     0.04s  0.44% 99.12%      7.47s 81.82%  primes
     0.03s  0.33% 99.45%      0.03s  0.33%  std::__1::locale::use_facet
     0.02s  0.22% 99.67%      0.04s  0.44%  std::__1::list::size
     0.01s  0.11% 99.78%      0.01s  0.11%  [libsystem_kernel.dylib]
     0.01s  0.11% 99.89%      0.01s  0.11%  std::__1::__compressed_pair::first
     0.01s  0.11%   100%      0.01s  0.11%  std::__1::__compressed_pair_elem::__get
         0     0%   100%      7.50s 82.15%  <unknown>
         0     0%   100%      7.50s 82.15%  [libdyld.dylib]
         0     0%   100%      0.01s  0.11%  [libsystem_c.dylib]
         0     0%   100%      3.65s 39.98%  [libsystem_platform.dylib]
         0     0%   100%      7.50s 82.15%  main
         0     0%   100%      0.02s  0.22%  std::__1::__list_imp::__sz
         0     0%   100%      0.01s  0.11%  std::__1::__stdoutbuf::overflow
         0     0%   100%      0.04s  0.44%  std::__1::basic_ostream::operator<<
         0     0%   100%      0.01s  0.11%  std::__1::basic_ostream::put
         0     0%   100%      0.01s  0.11%  std::__1::endl

Are these <unknown> part of the *.dylib's or something else? How can one check? Since these shared libraries are no longer present as files in MacOS, could it be that llvm-symbolizer/addr2line cannot recognize them and therefore returns <unknown>.

@Louis-Ye Louis-Ye added type: bug Buganizer type - Bug Priority: p3 Buganizer priority - P3 labels May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: p3 Buganizer priority - P3 type: bug Buganizer type - Bug
Projects
None yet
Development

No branches or pull requests

4 participants