Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce stack requirements for floating-point formatting #41509

Merged
merged 4 commits into from
Apr 30, 2017

Conversation

froydnj
Copy link
Contributor

@froydnj froydnj commented Apr 24, 2017

Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

fmt.pad_formatted_parts(&formatted)
}

fn float_to_exponential_common_shortest<T>(fmt: &mut Formatter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be a #[inline(never)], even though MAX_SIG_DIGITS is only 17. This is because when this gets inlined into float_to_exponential_common and formatting eventually calls float_to_exponential_common_exact, it results in combined stack usage equal to the sum of stack usage of float_to_exponential_common_shortest and float_to_exponential_common_exact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the call to float_to_exponential_common_exact is a tail call, so (at least on some platforms?--true on x86-64 at least), there's no worries about combined stack space usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86-64 is also the last platform to worry about stack usage when formatting floats. LLVM does not support tail calls everywhere sadly AFAIK.

upper: bool) -> Result
where T: flt2dec::DecodableFloat
{
let mut buf: [u8; 1024] = unsafe { mem::uninitialized() }; // enough for f32 and f64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function body should be unsafe rather than just the calls to mem::uninitialized. Even though its only mem::uninitialized, that’s really unsafe, it "infects" the value in such a way that you must also use it correctly.

where T: flt2dec::DecodableFloat
{
let mut buf: [u8; flt2dec::MAX_SIG_DIGITS] = unsafe { mem::uninitialized() }; // enough for f32 and f64
let mut parts: [flt2dec::Part; 7] = unsafe { mem::uninitialized() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

@nagisa
Copy link
Member

nagisa commented Apr 24, 2017

Doing this speeds up float formatting by ~10% or so

Are you sure the improvement comes from the fact that stack is allocated on-demand, rather than the fact that initialization is simply skipped?

@froydnj
Copy link
Contributor Author

froydnj commented Apr 24, 2017

Are you sure the improvement comes from the fact that stack is allocated on-demand, rather than the fact that initialization is simply skipped?

The improvement I measured on my machine was from reducing the amount of initialization needed for the to_shortest case. The win for skipping the initialization entirely wasn't measurable for the to_shortest case, but I didn't try very hard to measure it for the to_exact case.

@aturon aturon assigned nagisa and unassigned aturon Apr 25, 2017
@froydnj
Copy link
Contributor Author

froydnj commented Apr 25, 2017

To be clear: I'm happy to just drop the mem::uninitialized patch if that's desirable; I know @eddyb had made comments suggesting that it'd be better to use a safer abstraction there than just blindly dealing with uninitialized memory, but I think that'd be a separate PR.

@nagisa
Copy link
Member

nagisa commented Apr 25, 2017

I’m fine r+-ing the mem::uninitialized for now, as long as the whole function body is appropriately annotated as unsafe, thus correctly tainting all the involved code.

@nagisa
Copy link
Member

nagisa commented Apr 25, 2017

Additionally, the lines tidy complains about have to be fixed.

@eddyb
Copy link
Member

eddyb commented Apr 25, 2017

cc #41234 @bzbarsky

@aidanhs aidanhs added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 26, 2017
@nagisa
Copy link
Member

nagisa commented Apr 28, 2017

Could you also rebase & squash some commits while you’re at it? For a PR of this size 2-3 commits would be the appropriate high-watermark.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 28, 2017

Could you also rebase & squash some commits while you’re at it? For a PR of this size 2-3 commits would be the appropriate high-watermark.

Sure, was just trying to follow @rust-highfive's advice.

froydnj added 3 commits April 28, 2017 15:24
We have benchmarks for the floating-point formatting algorithms
themselves, but not for the surrounding machinery like Formatter and
translating to the flt2dec::Part slices.
For the two major entry points for float formatting, we split the exact
case and the shortest cases into separate functions.  We mark the
separate functions as #[inline(never) so the exact cases won't bloat
stack space in their callers unnecessarily.  The shortest cases are
marked so for similar reasons.

Fixes rust-lang#41234.
The comments for flt2dec::to_shortest_str says that we only need a slice
of length 5 for the parts array.  Initializing a 16-part array is just
wasted effort and wasted stack space.  Other functions in the flt2dec
module have similar comments, so we adjust the parts arrays passed to
those functions accordingly.
@froydnj froydnj force-pushed the float-stack-reduction branch from 61c5ab5 to fbf9cee Compare April 28, 2017 19:47
@nagisa
Copy link
Member

nagisa commented Apr 28, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Apr 28, 2017

📌 Commit fbf9cee has been approved by nagisa

@nagisa
Copy link
Member

nagisa commented Apr 28, 2017

Oh I just noticed you made the functions themselves unsafe rather than wrapped their whole body in unsafe.

A distinct difference between

unsafe fn float_to_exponential_common_exact<T>(fmt: &mut Formatter, num: &T,
                                               sign: flt2dec::Sign, precision: usize,
                                               upper: bool) -> Result
    where T: flt2dec::DecodableFloat
{
    let mut buf: [u8; 1024] = mem::uninitialized(); // enough for f32 and f64
    let mut parts: [flt2dec::Part; 7] = mem::uninitialized();
    let formatted = flt2dec::to_exact_exp_str(flt2dec::strategy::grisu::format_exact,
                                              *num, sign, precision,
                                              upper, &mut buf, &mut parts);
    fmt.pad_formatted_parts(&formatted)
}

and

fn float_to_exponential_common_exact<T>(fmt: &mut Formatter, num: &T,
                                               sign: flt2dec::Sign, precision: usize,
                                               upper: bool) -> Result
    where T: flt2dec::DecodableFloat
{
    unsafe {
    let mut buf: [u8; 1024] = mem::uninitialized(); // enough for f32 and f64
    let mut parts: [flt2dec::Part; 7] = mem::uninitialized();
    let formatted = flt2dec::to_exact_exp_str(flt2dec::strategy::grisu::format_exact,
                                              *num, sign, precision,
                                              upper, &mut buf, &mut parts);
    fmt.pad_formatted_parts(&formatted)
    }
}

is that in the first case there’s something caller needs to handle for it to be safe, whereas in the 2nd case no matter how caller calls the function, it is safe. When I said “The whole function body should be unsafe rather than just the calls to mem::uninitialized.” I meant it should be the 2nd case not the 1st. Could you change it so unsafes look like the 2nd code block?

@bors r-

@froydnj
Copy link
Contributor Author

froydnj commented Apr 28, 2017

Ah, OK, I misread what you meant earlier. I can do that, sure. Do you really want the body of the unsafe block unindented like that?

@nagisa
Copy link
Member

nagisa commented Apr 28, 2017

Indent as appropriate, I didn’t do so because it is hard to do so with web text field :)

Spending time to initialize these is just wasted work, as we'll
overwrite them soon anyway.

Fixes rust-lang#41259.
@froydnj froydnj force-pushed the float-stack-reduction branch from fbf9cee to b2c3102 Compare April 28, 2017 20:09
@froydnj
Copy link
Contributor Author

froydnj commented Apr 28, 2017

:) Fair enough.

@nagisa
Copy link
Member

nagisa commented Apr 28, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Apr 28, 2017

📌 Commit b2c3102 has been approved by nagisa

@bors
Copy link
Contributor

bors commented Apr 28, 2017

⌛ Testing commit b2c3102 with merge 8fd2f87...

@frewsxcv
Copy link
Member

@bors retry prioritizing rollup which includes this

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 29, 2017
…gisa

reduce stack requirements for floating-point formatting

Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 30, 2017
…gisa

reduce stack requirements for floating-point formatting

Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 30, 2017
…gisa

reduce stack requirements for floating-point formatting

Doing this speeds up float formatting by ~10% or so, and also makes the formatting code more suitable for embedded environments where stack space is at a premium.
bors added a commit that referenced this pull request Apr 30, 2017
Rollup of 6 pull requests

- Successful merges: #41449, #41509, #41608, #41613, #41636, #41637
- Failed merges:
@bors bors merged commit b2c3102 into rust-lang:master Apr 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants