Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the formatter_len_hint method to Show and use it in to_string #16544

Closed
wants to merge 1 commit into from

Conversation

pczarn
Copy link
Contributor

@pczarn pczarn commented Aug 17, 2014

I'm trying to balance potential code bloat and precision of the upper bound.

  • Can size_hint default to 0 instead of None?
  • How costly is occasional inaccuracy of the upper bound?
  • How to implement size_hint for floats? I've yet to think about it.
  • Is size_hint also that useful for the general case of format!, as implemented in this PR, not only to_string?

UFCS would make this addition prettier.

Examples

18446744073709551615u64.to_string().byte_capacity() // => 32
123u.to_string().byte_capacity() // => 3
12.34f32.to_string().byte_capacity() // => 9
"foobar".to_string().byte_capacity() // => 6

Fixes #16415

@@ -148,113 +171,159 @@ impl<'a> Show for Arguments<'a> {
pub trait Show {
/// Formats the value using the given formatter.
fn fmt(&self, &mut Formatter) -> Result;

/// Returns an upper bound on the size of the formatted string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should actually be a lower bound, or just a loosely defined "conservative estimate" (since a true upper bound can be much larger than the real value, leading to the same overallocation problems). In this case, returning 0 by default makes sense.

@alexcrichton
Copy link
Member

In general, can you provide data to support a change such as this? The issue only claims that it may be an issue, I don't see any concrete numbers one way or the other. I'd like to see some analysis which compares the heap usage before and afterwards of projects like rustc (and perhaps servo) to see if a change such as this reaps the theoretical benefits.

In the past the code size generated by format! has often been the source of many problems in both servo, rustc, and benchmarks. Could you do some analysis to see what the impact is of passing size hints for all invocations of format_args!. This is inflating the amount of code generated for each invocation, as well as stack space used to store the local slice, so I'm a little worried if it's worth the benefits.

@huonw
Copy link
Member

huonw commented Aug 17, 2014

as well as stack space used to store the local slice

(This can be addressed by storing a single pointer to a struct/tuple/vtable containing the two function pointers.)

let log2base = 1 << $log2log2base;

// Get the number of digits in the target base.
let binary_digits = width - (num | log2base).leading_zeros() as uint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this |?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To adjust for the fact that zero has one digit, so that

  • there are at most width - log2base - 1 leading zeros
  • binary_digits is at least log2base + 1
  • binary_digits / log2base is never zero

@pnkfelix
Copy link
Member

Note also that instead of changing the definition of format!, you could consider just changing the definition of to_string (while leaving the default method in Show itself). Of course, the choice here largely depends on the tradeoff of adding more code to format! (and thus increasing its code size) versus getting the broader benefits of having format! produce a more precisely sized String. I mention this largely as a way to try to semi-address @alexcrichton 's concerns. (Though either way, it would still be best to have concrete numbers presented as well, and logged in the issue or PR.)

@pczarn
Copy link
Contributor Author

pczarn commented Aug 17, 2014

@pnkfelix, I agree.

The change to format! increases the size of compiled librustc from 22.054 to 23.429MB, even though format! is practically never called during a successful compilation by default. I guess this bloat is so drastic due to a large number of debug!s and error messages.

I've just made these measurements.

before
wc -c x86_64-unknown-linux-gnu/stage2/lib/*
   67363 x86_64-unknown-linux-gnu/stage2/lib/libarena-4e7c5e5c.so
  160066 x86_64-unknown-linux-gnu/stage2/lib/libdebug-4e7c5e5c.so
  127453 x86_64-unknown-linux-gnu/stage2/lib/libflate-4e7c5e5c.so
   75365 x86_64-unknown-linux-gnu/stage2/lib/libfmt_macros-4e7c5e5c.so
  181903 x86_64-unknown-linux-gnu/stage2/lib/libgetopts-4e7c5e5c.so
   63877 x86_64-unknown-linux-gnu/stage2/lib/libgraphviz-4e7c5e5c.so
   93812 x86_64-unknown-linux-gnu/stage2/lib/liblog-4e7c5e5c.so
  579422 x86_64-unknown-linux-gnu/stage2/lib/libnative-4e7c5e5c.so
  281492 x86_64-unknown-linux-gnu/stage2/lib/librbml-4e7c5e5c.so
22054464 x86_64-unknown-linux-gnu/stage2/lib/librustc-4e7c5e5c.so
  400373 x86_64-unknown-linux-gnu/stage2/lib/librustc_back-4e7c5e5c.so
25064481 x86_64-unknown-linux-gnu/stage2/lib/librustc_llvm-4e7c5e5c.so
 2217782 x86_64-unknown-linux-gnu/stage2/lib/librustrt-4e7c5e5c.so
  942856 x86_64-unknown-linux-gnu/stage2/lib/libserialize-4e7c5e5c.so
 2742769 x86_64-unknown-linux-gnu/stage2/lib/libstd-4e7c5e5c.so
  868949 x86_64-unknown-linux-gnu/stage2/lib/libsync-4e7c5e5c.so
 7834446 x86_64-unknown-linux-gnu/stage2/lib/libsyntax-4e7c5e5c.so
  403999 x86_64-unknown-linux-gnu/stage2/lib/libterm-4e7c5e5c.so
  130546 x86_64-unknown-linux-gnu/stage2/lib/libtime-4e7c5e5c.so

after
wc -c x86_64-unknown-linux-gnu/copy\ of\ stage2/lib/*
   72573 x86_64-unknown-linux-gnu/copy of stage2/lib/libarena-4e7c5e5c.so
  165192 x86_64-unknown-linux-gnu/copy of stage2/lib/libdebug-4e7c5e5c.so
  132902 x86_64-unknown-linux-gnu/copy of stage2/lib/libflate-4e7c5e5c.so
   76314 x86_64-unknown-linux-gnu/copy of stage2/lib/libfmt_macros-4e7c5e5c.so
  182738 x86_64-unknown-linux-gnu/copy of stage2/lib/libgetopts-4e7c5e5c.so
   63342 x86_64-unknown-linux-gnu/copy of stage2/lib/libgraphviz-4e7c5e5c.so
   94202 x86_64-unknown-linux-gnu/copy of stage2/lib/liblog-4e7c5e5c.so
  598900 x86_64-unknown-linux-gnu/copy of stage2/lib/libnative-4e7c5e5c.so
  299335 x86_64-unknown-linux-gnu/copy of stage2/lib/librbml-4e7c5e5c.so
23429064 x86_64-unknown-linux-gnu/copy of stage2/lib/librustc-4e7c5e5c.so
  415548 x86_64-unknown-linux-gnu/copy of stage2/lib/librustc_back-4e7c5e5c.so
25064674 x86_64-unknown-linux-gnu/copy of stage2/lib/librustc_llvm-4e7c5e5c.so
 2271760 x86_64-unknown-linux-gnu/copy of stage2/lib/librustrt-4e7c5e5c.so
  977342 x86_64-unknown-linux-gnu/copy of stage2/lib/libserialize-4e7c5e5c.so
 2824152 x86_64-unknown-linux-gnu/copy of stage2/lib/libstd-4e7c5e5c.so
  914657 x86_64-unknown-linux-gnu/copy of stage2/lib/libsync-4e7c5e5c.so
 8043605 x86_64-unknown-linux-gnu/copy of stage2/lib/libsyntax-4e7c5e5c.so
  413601 x86_64-unknown-linux-gnu/copy of stage2/lib/libterm-4e7c5e5c.so
  146047 x86_64-unknown-linux-gnu/copy of stage2/lib/libtime-4e7c5e5c.so

@pczarn
Copy link
Contributor Author

pczarn commented Aug 17, 2014

Here's why size_hint is not useful for cases other than to_string: every instance of format! is easily replaceable with a write! to a buffer if it's inside a tight loop and/or causes higher memory usage.

I forgot about it before.

@pczarn pczarn changed the title (WIP) Add the size_hint method to Show as a default method (WIP) Add the size_hint method to Show as a default method and use it in to_string Aug 17, 2014
@pczarn pczarn changed the title (WIP) Add the size_hint method to Show as a default method and use it in to_string Add the size_hint method to Show as a default method and use it in to_string Aug 19, 2014
@lilyball
Copy link
Contributor

size_hint() is an awfully generic name to attach to all Show instances. I'd rather see something like

pub trait Show {
    // ...
    fn size_hint(val: &Self) -> Option<uint> { None }
}

thus turning usage into e.g.

let mut output = io::MemWriter::with_capacity(Show::size_hint(self).unwrap_or(128));

Although I am unsure how UFCS affects static trait methods.

@lilyball
Copy link
Contributor

If UFCS makes the static method behave the same as a regular method (I know it does for regular impls but not sure how that works for traits) then instead you could have a function fmt::size_hint() that does this.

The idea here is that I should be able to implement size_hint() on my own type that also implements Show without causing an ambiguity error when I try and call size_hint().

Edit: What am I thinking, third-party types can't implement fmt::size_hint().

@pczarn
Copy link
Contributor Author

pczarn commented Aug 20, 2014

This method has to work for many types. I agree that a method used once internally shouldn't have a generic name. Well, at least iterators don't implement Show.

How about changing it to fmt_size_hint()?

@eddyb
Copy link
Member

eddyb commented Sep 13, 2014

@pczarn @kballard keep in mind that Show is not in the prelude, and changing it affects much less code than actually would otherwise.

@pczarn
Copy link
Contributor Author

pczarn commented Sep 13, 2014

Renamed to formatter_len_hint.

@pczarn
Copy link
Contributor Author

pczarn commented Sep 22, 2014

Updated with a simplified implementation for floats. Added two tests.

@pczarn
Copy link
Contributor Author

pczarn commented Sep 29, 2014

This is good to go, right? I have a crude memory chart from stage1.

Imgur

Before:magenta, after:blue, the peak is smaller by 4MB.

@pczarn pczarn changed the title Add the size_hint method to Show as a default method and use it in to_string Add the formatter_len_hint method to Show and use it in to_string Sep 30, 2014
Implements `formatter_len_hint` for several simple types.

Length of the formatted string is approximated for floats and integers.
@alexcrichton
Copy link
Member

Note that there has been talk of separating Show and ToString, which may require tweaks to this patch or a whole different strategy altogether.

Committing to this style of API is taking us pretty hard down the road of using Show for transforming into a stream of bytes, which is a strong commitment to consider when Show is such a commonly used trait.

I'm curious, @aturon, do you have an opinion on this? This is something that can always be added backwards-compatibly later on, and I'd almost rather consider the stabilization of Show before considering optimizations like this (for what seems like only a small win?)

@eddyb
Copy link
Member

eddyb commented Oct 1, 2014

I believe the win is bigger, but most of it is already in (using into_string in the interner).

@aturon
Copy link
Member

aturon commented Oct 1, 2014

@pczarn, thanks for taking this on!

I agree with @alexcrichton that, while this seems like a reasonable extension, it seems wise to resolve the basic question about the role of Show and its connection to to_string before adding this functionality. I plan to work on an RFC on this topic very soon and can incorporate this idea as part of the proposal -- maybe best to wait on landing this PR until the community has had a chance to evaluate the overall design space. (And, as @alexcrichton says, depending on the design it may be that a completely different strategy for solving this problem is needed.)

I'll try to get a draft together in the next day or two, and ping you for feedback before posting it.

@alexcrichton
Copy link
Member

Closing to help clear out the queue, @aturon do you have an update on the RFC you were planning to write?

@SimonSapin
Copy link
Contributor

Iterator::size_hint has both a lower-bound (which can default to zero when unknown) and an optional upper bound, rather than a single number. Would it make sense to do something similar here? (Vec::from_iter ignore the upper bound and takes the lower bound as the initial capacity.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 18, 2024
…ykril

Add completions to show only traits in trait `impl` statement

This is prerequisite PR for adding the assist mentioned in rust-lang#12500

P.S: If wanted, I will add the implementation of the assist in this PR as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_string causes crazy memory usage problems
8 participants