Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit profiling support broken when instances created with malloc #1017

Closed
jlb6740 opened this issue Feb 27, 2020 · 13 comments
Closed

Jit profiling support broken when instances created with malloc #1017

jlb6740 opened this issue Feb 27, 2020 · 13 comments

Comments

@jlb6740
Copy link
Contributor

jlb6740 commented Feb 27, 2020

@yurydelendik @alexcrichton

Hi ... an initial patch that provides support for Perf's jitdump specification merged recently (#360) and another JIT supporting patch based on ittapi is in review (#819). I recently noticed though that the jitdump is suddenly no longer resolving properly when doing a perf report. I've traced the issue to a patch submitted right before merging the perf jitdump patch that apparently my last manual end-to-end tests weren't rebased against before it merged. It is the patch #948 which does away with mmap in favor a mechanism that uses the alloc crate which I believe uses malloc when creating instances in memory. There are no rust errors and perf report -v does not show any errors, but I think this is fatal to perf jit support (and maybe any jit support of wasmtime) because perf is trying to mmap that jitted memory region and afaik you can't mmap a malloc'd region. This theory may be incomplete, but certainly doing a "git revert -n b15b5cd" resolves the issue. Not sure the best way forward here.

Also this brings up a gap in testing where I am not sure how to automatically test the breakage of external tools such as the issue here.

@alexcrichton
Copy link
Member

Oh dear, sorry about that! I'm curious though to dig in a bit more with what's going wrong, the switch there from mmap to malloc was only for the Instance type which doesn't actually hold any JIT code itself.

Is there a way to poke around at this locally? I'm curious if something else is wrong by accident but I honestly know very little about jitdump to know where to begin here.

@jlb6740
Copy link
Contributor Author

jlb6740 commented Feb 28, 2020

@alexcrichton, Hi .. yes. I am not confident that I understand completely why this change is breaking for perf or that the problem can't be quickly mitigated with a change in the jitdump code. I did run perf with verbose option and saw no different output. Let me dig a little more.

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 9, 2020

@alexcrichton Hi, my understand of what may be going on here is evolving. It seems clear that using the rust allocator (calls malloc eventually) instead of mmap breaks jitdump support, but I have not pinpointed a way for confirmation to understand mitigations that don't require reverting to mmap. Here https://lore.kernel.org/patchwork/cover/622238/ and here https://lore.kernel.org/patchwork/patch/622240/ it appears it is implied that perf is using mmap records in the perf.dat file to help determine the injection points for the jitted code captured in the jitdump file. That makes sense but when doing something like: "sudo perf script --input perf.data --show-mmap-events | grep MMAP" to print mmaps recorded, data files both before and after the allocator change appear consistent. I am convinced thought the issue lies with perf inject not being able to figure out how to inject the jitted code. I'll do more digging.

@yurydelendik @alexcrichton In the meantime note ittapi for vtune support has been rebased and is not hampered by the allocator change and so #819 is unaffected.

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 9, 2020

Note also, to reproduce:

git checkout -b test_with_mmap c8ab1e2
git revert -n b15b5cd
git cherry-pick f322555
... compile with jitdump feature flag and follow instructions in jitdump.rs

vs

git checkout -b test_with_malloc c8ab1e2
git cherry-pick f322555
... compile with jitdump feature flag and follow instructions in jitdump.rs

@alexcrichton
Copy link
Member

Ah sorry for the delay, but I'm trying to reproduce this locally but I seem to be having difficulty.

Using this script (and a custom-built perf from the latest kernel branch):

set -ex
cargo build --release --features jitdump
rustc fib.rs --target wasm32-wasi -Cdebuginfo=0
$HOME/code/linux/tools/perf/perf record -k 1 -e instructions:u target/release/wasmtime -g --jitdump fib.wasm
$HOME/code/linux/tools/perf/perf inject -v -j -i perf.data -o perf.jit.data
exec $HOME/code/linux/tools/perf/perf report -i perf.jit.data -F+period,srcline

and fib.rs looking like:

fn main() {
    let n = 40;
    println!("fib({}) = {}", n, fib(n));
}

fn fib(n: u32) -> u32 {
    if n <= 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}

I get:

 $ sh run.sh                                                                                                                                                                                                                                                                                                   
+ cargo build --release --features jitdump
    Finished release [optimized] target(s) in 0.09s
+ rustc fib.rs --target wasm32-wasi -Cdebuginfo=0
+ /home/alex/code/linux/tools/perf/perf record -k 1 -e instructions:u target/release/wasmtime -g --jitdump fib.wasm
fib(1) = 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (381 samples) ]
+ /home/alex/code/linux/tools/perf/perf inject -v -j -i perf.data -o perf.jit.data
build id event received for [kernel.kallsyms]: 8081b1e9482d2e82a84b1fdbadd94e2f03abc94e
build id event received for /home/alex/code/wasmtime/target/release/wasmtime: b1d159cff0963d77ad19d66a2d17318c17b9db7e
build id event received for /lib/x86_64-linux-gnu/ld-2.27.so: 64df1b961228382fe18684249ed800ab1dceaad4
build id event received for [vdso]: d94bf3cbb1070beb0ee71a51bbb2b4b3a26d353e
build id event received for /lib/x86_64-linux-gnu/libpthread-2.27.so: 28c6aade70b2d40d1f0f3d0a1a0cad1ab816448f
build id event received for /lib/x86_64-linux-gnu/libgcc_s.so.1: b6a35ea137b91d88bc9f0a2dbf04000b0db8c874
build id event received for /lib/x86_64-linux-gnu/libc-2.27.so: b417c0ba7cc5cf06d1d1bed6652cedb9253c60d0
jit marker found: /home/alex/code/wasmtime/jit-21004.dump
injecting: /home/alex/code/wasmtime/jit-21004.dump

Even after I revert the malloc change I still ge tthe same error, so I'm not sure what it is I'm doing wrong?

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 12, 2020

@alexcrichton Hi, thanks for trying this! Did you try using sudo? Without sudo perf and related tools appear to be limited. There is a setting, perf_event_paranoid that I read should help, but I just always use sudo while doing any perf command. Also ... what exact error are you seeing. I think it maybe got cut off?

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 12, 2020

@alexcrichton Also separately, are your HEAD set to c8ab1e2. I ask because when I compile the fib.rs file and try to run, it will hang if I use the -g option ... even when not dumping a jitdump file.

@alexcrichton
Copy link
Member

Ah yeah it's true I didn't use that same HEAD, I used master of wasmtime yesterday. I believe the hang was resolved by #1228 so I wanted a build that included that.

I do seem to be getting further with sudo, thanks for the tip! I thought that paranoid thing was already set and meant I didn't need sudo, but alas!

With that though I'm still getting confusing results, nothing really looks like it's symbolicated. Going off the master branch, and the HEAD you're using, both have confusingly verbose perf reports before/after the revert of the malloc/mmap commit for me locally.

Can you gist what it's expected to see from perf?

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 12, 2020

Hi @alexcrichton, the difference manifest in perf report and perf annotate and also perf script on the injected file where you can see the jitted.so file is correctly associated with instruction samples when working correctly. In the screen shots below you can see the difference. The left is when malloc is used to create the instance versus the right when mmap is used. I don't feel comfortable not understanding why perf needs this region to be mmaped notes from the posted conversations about the patches imply that is the case. During perf inject you can see perf have trouble accessing an map file in temp that should be associated with the pid of the process and it not have trouble on the mmapped version, but things aren't crystal clear. I am going through the perf kernel code now to see if I can understand better.
perf report (left is using the malloc patch, right is with it removed)
image

perf report with drill down into annotation
image

@alexcrichton
Copy link
Member

Ok I've been poking around this a lot more and I've managed to get some degree of success. I think there's a lot going on under the hood and there's a huge number of places things could go wrong, and I don't think malloc/mmap of an instance is one of them.

My main discovery I've found so far is that if you remove the dwarf info from the wasm module input it appears to work. In my fib example perf report yields:

Samples: 1K of event 'instructions:u', Event count (approx.): 4226064351
Overhead  Command   Shared Object       Symbol
  99.89%  wasmtime  jitted-30178-0.so   [.] wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E
   0.01%  wasmtime  libc-2.27.so        [.] __memmove_avx_unaligned_erms
   0.01%  wasmtime  ld-2.27.so          [.] _dl_relocate_object
   0.01%  wasmtime  wasmtime            [.] <std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write

which is basically what I would want to see (modulo some demangling ideally). If I annotate the fib function I get:

Samples: 1K of event 'instructions:u', 4000 Hz, Event count (approx.): 4226064351
wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E  /home/alex/code/wasmtime/wut/jitted-30178-0.so [Percent: local period]
Percent│
       │
       │
       │    Disassembly of section .text:
       │
       │    0000000000000040 <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E>:
       │    wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E():
  5.17 │      rex    push %rbp
  1.50 │      mov    %rsp,%rbp
  4.63 │      push   %r12
  2.24 │      push   %r13
  2.06 │      push   %r14
  3.28 │      push   %r15
  2.52 │      sub    $0x20,%rsp
  2.42 │      mov    %rdi,0x10(%rsp)
  5.88 │      rex    mov %edx,0x1c(%rsp)
  3.07 │      rex    mov $0x1,%eax
  1.51 │      rex    mov %eax,0x18(%rsp)
  2.47 │      mov    %edx,%r15d
  2.37 │      cmp    $0x3,%r15d
       │      rex    jb e4 <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E+0xa4>
  1.73 │      movabs $0x7fe11e9b0000,%rax
  1.85 │      mov    %rax,0x8(%rsp)
  3.56 │      mov    0x1c(%rsp),%r15d
  2.49 │      add    $0xffffffff,%r15d
  3.07 │      mov    0x10(%rsp),%r14
  1.12 │      mov    0x8(%rsp),%r13
  3.19 │      mov    %r14,%r12
  1.81 │      mov    %r12,%rdi
  2.21 │      mov    %r14,%rsi
  1.46 │      mov    %r15d,%edx
  2.69 │    → callq  *%r13
  2.75 │      mov    0x18(%rsp),%r15d
  3.11 │      add    %r15d,%eax
  1.95 │      rex    mov %eax,0x18(%rsp)
  3.40 │      mov    0x1c(%rsp),%r15d
  1.75 │      add    $0xfffffffe,%r15d
  2.34 │      mov    %r15d,0x1c(%rsp)
  1.32 │      cmp    $0x2,%r15d
       │      rex    ja 8b <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E+0x4b>
...

basically things actually look pretty accurate here.

That was generated on the master branch with a few modifications:

  • I manually removed dwarf info from the input file
  • I manually removed emission of dwarf-related entries into the jitdump file

I suspect that there's probably bugs in the handling of dwarf information? Either that or I suspect there's bugs in how there's tons of dwarf in a normal wasm file from rustc which is related to libstd, but most of libstd isn't present in the wasm file itself. (and maybe there's dwarf lacking for the functions generated in the file?)

After reviewing the code a bit I think that we may want to refactor this quite a bit? One improvement I can think of is that the --jitdump flag shouldn't require -g, and without -g we should still emit a code load entry per-function instead of one per-module. We've got the metadata to know where all the functions are and we at least have the name section. I think the way the code should be structured is that we always emit one code load entry per function in a wasm module, and then optionally, if available, we emit extra debuginfo entries as well.

I haven't really looked too too much into the debuginfo processing itself, but I think that the general output of the jitdump module will be improved if debuginfo is only augmenting existing records rather than driving the emission of records. My current suspicion is that the debuginfo augmentation is perhaps buggy right now which leads perf inject to run amok when parsing the dump file, leading to corrupted annotations and misinformation.

I'm not really sure why it worked before or after the mmap/malloc change. I read over the patches you linked and I don't think that it has anything to do with the mmap/malloc change that we made. We made a change of how an Instance is allocated but it doesn't really have anything to do with JIT code or anything like that. The best I could surmise is that mmap is just required for the dump file to happen at runtime, so perf inject knows how to find the dump file at all. It didn't even look, though, like mmap-business was required for the code itself.

@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 12, 2020

This is very interest! Thanks for poking around here. note, I am not using -g in the experiments above so I'd like to reproduce what you've done since that shouldn't be an issue for me either. In general, the -g is not required. It is only an option to take advantage of dwarf included in the wasm to include line information in the annotation. If dwarf isn't included the jitdump should still be generated without issue, but perhaps there is something more there to look at even when the flag isn't used.

What steps can I use to reproduce your findings? BTW .. I have a habit of using C .. my fib wasm file is actually compiled from C using Clang and includes dwarf debug but I should also try the version that doesn't. I should also try your Rust version ... but that didn't run on my HEAD due to the bug you pointed out.

"The best I could surmise is that mmap is just required for the dump file to happen at runtime, so perf inject knows how to find the dump file at all. It didn't even look, though, like mmap-business was required for the code itself."

What are you thinking here? The jitdump file needs to mmap'd during running so that it is injected into the original perf file. Ok, I think I see. Yes, I think you are correct. Perhaps using mmap to create this jitdump file is the only extra mmap requirement and the problem indeed is something else. This is good. Let me try your steps to reproduce. Was your experiment done with HEAD on a rust generated wasm file with no debug information?

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 12, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. This
isn't in a final state yet since the profiling still isn't working, but
it's a base which I think we'll want to work from. Some various
refactorings here are:

* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`
* Split out windows/unix files in `jitdump.rs` to avoid having lots of
  `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Invoke the agent's `module_load` method during compilation, not
  afterwards, so it's all baked into one call.
* Pass in a list of finished functions instead of simply a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.

I think there's still some refactoring work to do in handling debuginfo
and such, but I'm hoping that this is a base at least to work from!
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 12, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. This
isn't in a final state yet since the profiling still isn't working, but
it's a base which I think we'll want to work from. Some various
refactorings here are:

* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`
* Split out windows/unix files in `jitdump.rs` to avoid having lots of
  `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Invoke the agent's `module_load` method during compilation, not
  afterwards, so it's all baked into one call.
* Pass in a list of finished functions instead of simply a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.

I think there's still some refactoring work to do in handling debuginfo
and such, but I'm hoping that this is a base at least to work from!
@alexcrichton
Copy link
Member

Oh sure yeah, lemme write some stuff down. So as I was reading all this I started to do a bit of the refactoring I was mentioning, and I've made a commit but it doesn't work at all with debuginfo. Without debuginfo it should work though.

In any case, I've been using -g this whole time because of this comment but actually wasmtime from master works for me without -g. Using the master branch I see: 99.90% wasmtime jitted-25162-0.so [.] wasm_module. I think the only issue is that the entire wasm module is lumped into one function instead of being split apart (but my commit fixes that).

In that sense I don't actually need any changes to get perf working without -g, and only with -g do things start going awry.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 13, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:

* Pass in a list of finished functions instead of just a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
  lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`

The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 17, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:

* Pass in a list of finished functions instead of just a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
  lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`

The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 19, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:

* Pass in a list of finished functions instead of just a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
  lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`

The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Mar 20, 2020
This the result of some of the investigation I was doing for bytecodealliance#1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:

* Pass in a list of finished functions instead of just a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
  lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`

The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
alexcrichton added a commit that referenced this issue Mar 20, 2020
* Enable jitdump profiling support by default

This the result of some of the investigation I was doing for #1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:

* Pass in a list of finished functions instead of just a range to
  ensure that we're emitting jit dump data for a specific module rather
  than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
  locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
  lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
  deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
  `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`

The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
@jlb6740
Copy link
Contributor Author

jlb6740 commented Mar 20, 2020

Think I can close this. Based on discussion here the code has been refactored: (#1310) and is working for most examples (though not 100%). Also it is thought that (#948) simply highlighted a bug and profiling support should work fine with the change.

@jlb6740 jlb6740 closed this as completed Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants