Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dr_simd_t to separate allocation to support dynamic vector sizing and reduce stack usage #7212

Open
derekbruening opened this issue Jan 23, 2025 · 0 comments

Comments

@derekbruening
Copy link
Contributor

Xref discussion at #3544 (comment) on splitting out dr_simd_t from the mcontext to help reduce offset sizes for RISC-V.

The primary cross-arch benefit of separating out dr_simd_t and making it dynamically allocated would be to support any vector size. Today both RISC-V and AArch64 support large SIMD vectors, but DR has a hardcoded cap (256-bit for RISC-V and 512-bit for AArch64) in order to keep the mcontext small and stack-allocatable.

The size of the mcontext is a problem on all platforms since DR uses small thread stacks. Allocating several mcontexts across a call chain can overflow the stack and with today's large SIMD fields inside mcontext we have to be careful declaring them on the stack. See comments in signal.c like these:

    /* It's safe to allocate since we do not send signals that interrupt DR.
     * With priv_mcontext_t x2 that's a little big for stack alloc.
     */
    si.mcontext = heap_alloc(dcontext, sizeof(*si.mcontext) HEAPACCT(ACCT_OTHER));
    si.raw_mcontext = heap_alloc(dcontext, sizeof(*si.raw_mcontext) HEAPACCT(ACCT_OTHER));
/* Helper that takes over the current thread signaled via SUSPEND_SIGNAL.  Kept
 * separate mostly to keep the priv_mcontext_t allocation out of
 * main_signal_handler_C.

However, there are signal handler paths where we need to declare and use new mcontexts and allocating heap is problematic. An indirected dr_simd_t could still be placed on the stack right after the mcontext: but only if there's stack space. We would have to create a new signal-safe heap pool for dr_simd_t, which has to be provisioned ahead of time and so has per-thread overhead to be considered (we run applications with thousands of threads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant