-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce stack requirements for floating-point formatting #41509
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/libcore/fmt/float.rs
Outdated
fmt.pad_formatted_parts(&formatted) | ||
} | ||
|
||
fn float_to_exponential_common_shortest<T>(fmt: &mut Formatter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a #[inline(never)]
, even though MAX_SIG_DIGITS
is only 17
. This is because when this gets inlined into float_to_exponential_common
and formatting eventually calls float_to_exponential_common_exact
, it results in combined stack usage equal to the sum of stack usage of float_to_exponential_common_shortest
and float_to_exponential_common_exact
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the call to float_to_exponential_common_exact
is a tail call, so (at least on some platforms?--true on x86-64 at least), there's no worries about combined stack space usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86-64 is also the last platform to worry about stack usage when formatting floats. LLVM does not support tail calls everywhere sadly AFAIK.
src/libcore/fmt/float.rs
Outdated
upper: bool) -> Result | ||
where T: flt2dec::DecodableFloat | ||
{ | ||
let mut buf: [u8; 1024] = unsafe { mem::uninitialized() }; // enough for f32 and f64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole function body should be unsafe
rather than just the calls to mem::uninitialized
. Even though its only mem::uninitialized
, that’s really unsafe, it "infects" the value in such a way that you must also use it correctly.
src/libcore/fmt/float.rs
Outdated
where T: flt2dec::DecodableFloat | ||
{ | ||
let mut buf: [u8; flt2dec::MAX_SIG_DIGITS] = unsafe { mem::uninitialized() }; // enough for f32 and f64 | ||
let mut parts: [flt2dec::Part; 7] = unsafe { mem::uninitialized() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here.
Are you sure the improvement comes from the fact that stack is allocated on-demand, rather than the fact that initialization is simply skipped? |
The improvement I measured on my machine was from reducing the amount of initialization needed for the |
To be clear: I'm happy to just drop the |
I’m fine r+-ing the |
Additionally, the lines tidy complains about have to be fixed. |
Could you also rebase & squash some commits while you’re at it? For a PR of this size 2-3 commits would be the appropriate high-watermark. |
Sure, was just trying to follow @rust-highfive's advice. |
We have benchmarks for the floating-point formatting algorithms themselves, but not for the surrounding machinery like Formatter and translating to the flt2dec::Part slices.
For the two major entry points for float formatting, we split the exact case and the shortest cases into separate functions. We mark the separate functions as #[inline(never) so the exact cases won't bloat stack space in their callers unnecessarily. The shortest cases are marked so for similar reasons. Fixes rust-lang#41234.
The comments for flt2dec::to_shortest_str says that we only need a slice of length 5 for the parts array. Initializing a 16-part array is just wasted effort and wasted stack space. Other functions in the flt2dec module have similar comments, so we adjust the parts arrays passed to those functions accordingly.
61c5ab5
to
fbf9cee
Compare
@bors r+ |
📌 Commit fbf9cee has been approved by |
Oh I just noticed you made the functions themselves A distinct difference between
and
is that in the first case there’s something caller needs to handle for it to be safe, whereas in the 2nd case no matter how caller calls the function, it is safe. When I said “The whole function body should be unsafe rather than just the calls to mem::uninitialized.” I meant it should be the 2nd case not the 1st. Could you change it so unsafes look like the 2nd code block? @bors r- |
Ah, OK, I misread what you meant earlier. I can do that, sure. Do you really want the body of the |
Indent as appropriate, I didn’t do so because it is hard to do so with web text field :) |
Spending time to initialize these is just wasted work, as we'll overwrite them soon anyway. Fixes rust-lang#41259.
fbf9cee
to
b2c3102
Compare
:) Fair enough. |
@bors r+ |
📌 Commit b2c3102 has been approved by |
⌛ Testing commit b2c3102 with merge 8fd2f87... |
@bors retry prioritizing rollup which includes this |
…gisa reduce stack requirements for floating-point formatting Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
…gisa reduce stack requirements for floating-point formatting Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
…gisa reduce stack requirements for floating-point formatting Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.