-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit profiling support broken when instances created with malloc #1017
Comments
Oh dear, sorry about that! I'm curious though to dig in a bit more with what's going wrong, the switch there from mmap to malloc was only for the Is there a way to poke around at this locally? I'm curious if something else is wrong by accident but I honestly know very little about jitdump to know where to begin here. |
@alexcrichton, Hi .. yes. I am not confident that I understand completely why this change is breaking for perf or that the problem can't be quickly mitigated with a change in the jitdump code. I did run perf with verbose option and saw no different output. Let me dig a little more. |
@alexcrichton Hi, my understand of what may be going on here is evolving. It seems clear that using the rust allocator (calls malloc eventually) instead of mmap breaks jitdump support, but I have not pinpointed a way for confirmation to understand mitigations that don't require reverting to mmap. Here https://lore.kernel.org/patchwork/cover/622238/ and here https://lore.kernel.org/patchwork/patch/622240/ it appears it is implied that perf is using mmap records in the perf.dat file to help determine the injection points for the jitted code captured in the jitdump file. That makes sense but when doing something like: "sudo perf script --input perf.data --show-mmap-events | grep MMAP" to print mmaps recorded, data files both before and after the allocator change appear consistent. I am convinced thought the issue lies with perf inject not being able to figure out how to inject the jitted code. I'll do more digging. @yurydelendik @alexcrichton In the meantime note ittapi for vtune support has been rebased and is not hampered by the allocator change and so #819 is unaffected. |
Note also, to reproduce: git checkout -b test_with_mmap c8ab1e2 vs git checkout -b test_with_malloc c8ab1e2 |
Ah sorry for the delay, but I'm trying to reproduce this locally but I seem to be having difficulty. Using this script (and a custom-built set -ex
cargo build --release --features jitdump
rustc fib.rs --target wasm32-wasi -Cdebuginfo=0
$HOME/code/linux/tools/perf/perf record -k 1 -e instructions:u target/release/wasmtime -g --jitdump fib.wasm
$HOME/code/linux/tools/perf/perf inject -v -j -i perf.data -o perf.jit.data
exec $HOME/code/linux/tools/perf/perf report -i perf.jit.data -F+period,srcline and fn main() {
let n = 40;
println!("fib({}) = {}", n, fib(n));
}
fn fib(n: u32) -> u32 {
if n <= 2 {
1
} else {
fib(n - 1) + fib(n - 2)
}
} I get:
Even after I revert the malloc change I still ge tthe same error, so I'm not sure what it is I'm doing wrong? |
@alexcrichton Hi, thanks for trying this! Did you try using sudo? Without sudo perf and related tools appear to be limited. There is a setting, perf_event_paranoid that I read should help, but I just always use sudo while doing any perf command. Also ... what exact error are you seeing. I think it maybe got cut off? |
@alexcrichton Also separately, are your HEAD set to c8ab1e2. I ask because when I compile the fib.rs file and try to run, it will hang if I use the -g option ... even when not dumping a jitdump file. |
Ah yeah it's true I didn't use that same HEAD, I used master of wasmtime yesterday. I believe the hang was resolved by #1228 so I wanted a build that included that. I do seem to be getting further with With that though I'm still getting confusing results, nothing really looks like it's symbolicated. Going off the master branch, and the HEAD you're using, both have confusingly verbose perf reports before/after the revert of the malloc/mmap commit for me locally. Can you gist what it's expected to see from |
Hi @alexcrichton, the difference manifest in perf report and perf annotate and also perf script on the injected file where you can see the jitted.so file is correctly associated with instruction samples when working correctly. In the screen shots below you can see the difference. The left is when malloc is used to create the instance versus the right when mmap is used. I don't feel comfortable not understanding why perf needs this region to be mmaped notes from the posted conversations about the patches imply that is the case. During perf inject you can see perf have trouble accessing an map file in temp that should be associated with the pid of the process and it not have trouble on the mmapped version, but things aren't crystal clear. I am going through the perf kernel code now to see if I can understand better. |
Ok I've been poking around this a lot more and I've managed to get some degree of success. I think there's a lot going on under the hood and there's a huge number of places things could go wrong, and I don't think malloc/mmap of an instance is one of them. My main discovery I've found so far is that if you remove the dwarf info from the wasm module input it appears to work. In my fib example
which is basically what I would want to see (modulo some demangling ideally). If I annotate the
basically things actually look pretty accurate here. That was generated on the master branch with a few modifications:
I suspect that there's probably bugs in the handling of dwarf information? Either that or I suspect there's bugs in how there's tons of dwarf in a normal wasm file from rustc which is related to libstd, but most of libstd isn't present in the wasm file itself. (and maybe there's dwarf lacking for the functions generated in the file?) After reviewing the code a bit I think that we may want to refactor this quite a bit? One improvement I can think of is that the I haven't really looked too too much into the debuginfo processing itself, but I think that the general output of the jitdump module will be improved if debuginfo is only augmenting existing records rather than driving the emission of records. My current suspicion is that the debuginfo augmentation is perhaps buggy right now which leads I'm not really sure why it worked before or after the mmap/malloc change. I read over the patches you linked and I don't think that it has anything to do with the mmap/malloc change that we made. We made a change of how an |
This is very interest! Thanks for poking around here. note, I am not using -g in the experiments above so I'd like to reproduce what you've done since that shouldn't be an issue for me either. In general, the -g is not required. It is only an option to take advantage of dwarf included in the wasm to include line information in the annotation. If dwarf isn't included the jitdump should still be generated without issue, but perhaps there is something more there to look at even when the flag isn't used. What steps can I use to reproduce your findings? BTW .. I have a habit of using C .. my fib wasm file is actually compiled from C using Clang and includes dwarf debug but I should also try the version that doesn't. I should also try your Rust version ... but that didn't run on my HEAD due to the bug you pointed out. "The best I could surmise is that mmap is just required for the dump file to happen at runtime, so perf inject knows how to find the dump file at all. It didn't even look, though, like mmap-business was required for the code itself." What are you thinking here? The jitdump file needs to mmap'd during running so that it is injected into the original perf file. Ok, I think I see. Yes, I think you are correct. Perhaps using mmap to create this jitdump file is the only extra mmap requirement and the problem indeed is something else. This is good. Let me try your steps to reproduce. Was your experiment done with HEAD on a rust generated wasm file with no debug information? |
This the result of some of the investigation I was doing for bytecodealliance#1017. This isn't in a final state yet since the profiling still isn't working, but it's a base which I think we'll want to work from. Some various refactorings here are: * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` * Split out windows/unix files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Invoke the agent's `module_load` method during compilation, not afterwards, so it's all baked into one call. * Pass in a list of finished functions instead of simply a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. I think there's still some refactoring work to do in handling debuginfo and such, but I'm hoping that this is a base at least to work from!
This the result of some of the investigation I was doing for bytecodealliance#1017. This isn't in a final state yet since the profiling still isn't working, but it's a base which I think we'll want to work from. Some various refactorings here are: * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` * Split out windows/unix files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Invoke the agent's `module_load` method during compilation, not afterwards, so it's all baked into one call. * Pass in a list of finished functions instead of simply a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. I think there's still some refactoring work to do in handling debuginfo and such, but I'm hoping that this is a base at least to work from!
Oh sure yeah, lemme write some stuff down. So as I was reading all this I started to do a bit of the refactoring I was mentioning, and I've made a commit but it doesn't work at all with debuginfo. Without debuginfo it should work though. In any case, I've been using In that sense I don't actually need any changes to get |
This the result of some of the investigation I was doing for bytecodealliance#1017. I've done a number of refactorings here which culminated in a number of changes that all amount to what I think should result in jitdump support being enabled by default: * Pass in a list of finished functions instead of just a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Add support to the C API to enable profiling * Documentation added for profiling with jitdump to the book * Split out supported/unsupported files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` The `jitdump` Cargo feature is now enabled by default which means that our published binaries, C API artifacts, and crates will support profiling at runtime by default. The support I don't think is fully fleshed out and working but I think it's probably in a good enough spot we can get users playing around with it!
This the result of some of the investigation I was doing for bytecodealliance#1017. I've done a number of refactorings here which culminated in a number of changes that all amount to what I think should result in jitdump support being enabled by default: * Pass in a list of finished functions instead of just a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Add support to the C API to enable profiling * Documentation added for profiling with jitdump to the book * Split out supported/unsupported files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` The `jitdump` Cargo feature is now enabled by default which means that our published binaries, C API artifacts, and crates will support profiling at runtime by default. The support I don't think is fully fleshed out and working but I think it's probably in a good enough spot we can get users playing around with it!
This the result of some of the investigation I was doing for bytecodealliance#1017. I've done a number of refactorings here which culminated in a number of changes that all amount to what I think should result in jitdump support being enabled by default: * Pass in a list of finished functions instead of just a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Add support to the C API to enable profiling * Documentation added for profiling with jitdump to the book * Split out supported/unsupported files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` The `jitdump` Cargo feature is now enabled by default which means that our published binaries, C API artifacts, and crates will support profiling at runtime by default. The support I don't think is fully fleshed out and working but I think it's probably in a good enough spot we can get users playing around with it!
This the result of some of the investigation I was doing for bytecodealliance#1017. I've done a number of refactorings here which culminated in a number of changes that all amount to what I think should result in jitdump support being enabled by default: * Pass in a list of finished functions instead of just a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Add support to the C API to enable profiling * Documentation added for profiling with jitdump to the book * Split out supported/unsupported files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` The `jitdump` Cargo feature is now enabled by default which means that our published binaries, C API artifacts, and crates will support profiling at runtime by default. The support I don't think is fully fleshed out and working but I think it's probably in a good enough spot we can get users playing around with it!
* Enable jitdump profiling support by default This the result of some of the investigation I was doing for #1017. I've done a number of refactorings here which culminated in a number of changes that all amount to what I think should result in jitdump support being enabled by default: * Pass in a list of finished functions instead of just a range to ensure that we're emitting jit dump data for a specific module rather than a whole `CodeMemory` which may have other modules. * Define `ProfilingStrategy` in the `wasmtime` crate to have everything locally-defined * Add support to the C API to enable profiling * Documentation added for profiling with jitdump to the book * Split out supported/unsupported files in `jitdump.rs` to avoid having lots of `#[cfg]`. * Make dependencies optional that are only used for `jitdump`. * Move initialization up-front to `JitDumpAgent::new()` instead of deferring it to the first module. * Pass around `Arc<dyn ProfilingAgent>` instead of `Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>` The `jitdump` Cargo feature is now enabled by default which means that our published binaries, C API artifacts, and crates will support profiling at runtime by default. The support I don't think is fully fleshed out and working but I think it's probably in a good enough spot we can get users playing around with it!
@yurydelendik @alexcrichton
Hi ... an initial patch that provides support for Perf's jitdump specification merged recently (#360) and another JIT supporting patch based on ittapi is in review (#819). I recently noticed though that the jitdump is suddenly no longer resolving properly when doing a perf report. I've traced the issue to a patch submitted right before merging the perf jitdump patch that apparently my last manual end-to-end tests weren't rebased against before it merged. It is the patch #948 which does away with mmap in favor a mechanism that uses the alloc crate which I believe uses malloc when creating instances in memory. There are no rust errors and perf report -v does not show any errors, but I think this is fatal to perf jit support (and maybe any jit support of wasmtime) because perf is trying to mmap that jitted memory region and afaik you can't mmap a malloc'd region. This theory may be incomplete, but certainly doing a "git revert -n b15b5cd" resolves the issue. Not sure the best way forward here.
Also this brings up a gap in testing where I am not sure how to automatically test the breakage of external tools such as the issue here.
The text was updated successfully, but these errors were encountered: