You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On macOS, the recommended way to implement a JIT system is by creating the memory map with PROT_WRITE | PROT_EXECand the MAP_JIT flag, then using pthread_jit_write_protect_np to switch between writing and executing the buffer.
(This is kinda weird, because the W^X behavior is tracked on a per-thread basis, rather than per-region; I found it easiest to only enable W right before copying into the region, then disable it afterwards)
Anyways, it turns out that this is much faster than using mmap to swap regions from PROT_WRITE to PROT_EXEC!
Here's a flamegraph using mmap
(note the calls to mprotect and memmove taking up a good chunk of time)
Here's what it looks like with pthread_jit_write_protect_np
(those calls are gone, and pthread_jit_write_protect_np doesn't even show up)
I see one benchmark go from 112 ms down to 62 ms, almost a 50% improvement!
(My benchmarks are admittedly weird, in that they compile a lot of very small functions 😆)
This requires ditching / forking the memmap2 crate, which doesn't support this behavior. Here's how I did it.
Right now, it's easy for users to do this on their own: I'm using a VecAssembler then copying into this custom struct Mmap, which works fine. Still, this would be a decent optimization for the stock Assembler.
As always, the dynasm-rs is great, and I really appreciate the work that went into it!
The text was updated successfully, but these errors were encountered:
Interesting, seems like Apple probably also ran into perf bottlenecks with memory protection swapping due to needing to JIT x64 code, so they added a thread-based switch that alters the behaviour of page table permission checks without needing to edit them (and the fairly expensive cache flushes that that tends to cause).
I have no apple hardware myself (I'm just a student who wrote this initially for a fun side project). So it's hard to validate these things, but then again, this is the reason dynasmrt exports basically all the needed components to construct your own custom assemblers as there's too many different but valid ways that you might want to manage your JIT memory. It's therefore probably not needed for this support to be in tree, but let me know if you're missing any components that could be reused outside of the internals of dynasmrt that aren't exported.
On macOS, the recommended way to implement a JIT system is by creating the memory map with
PROT_WRITE | PROT_EXEC
and theMAP_JIT
flag, then usingpthread_jit_write_protect_np
to switch between writing and executing the buffer.(This is kinda weird, because the W^X behavior is tracked on a per-thread basis, rather than per-region; I found it easiest to only enable W right before copying into the region, then disable it afterwards)
Anyways, it turns out that this is much faster than using
mmap
to swap regions fromPROT_WRITE
toPROT_EXEC
!Here's a flamegraph using
mmap
(note the calls to
mprotect
andmemmove
taking up a good chunk of time)Here's what it looks like with
pthread_jit_write_protect_np
(those calls are gone, and
pthread_jit_write_protect_np
doesn't even show up)I see one benchmark go from 112 ms down to 62 ms, almost a 50% improvement!
(My benchmarks are admittedly weird, in that they compile a lot of very small functions 😆)
This requires ditching / forking the
memmap2
crate, which doesn't support this behavior. Here's how I did it.Right now, it's easy for users to do this on their own: I'm using a
VecAssembler
then copying into this customstruct Mmap
, which works fine. Still, this would be a decent optimization for the stockAssembler
.As always, the
dynasm-rs
is great, and I really appreciate the work that went into it!The text was updated successfully, but these errors were encountered: