Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Split Show into String and Show #504

Merged
merged 5 commits into from
Dec 19, 2014

Conversation

alexcrichton
Copy link
Member

Today's Show trait will be tasked with the purpose of providing the ability to
inspect the representation of implementors of the trait. A new trait, String,
will be introduced to the std::fmt module to in order to represent data that
can essentially be serialized to a string, typically representing the precise
internal state of the implementor.

The String trait will take over the {} format specifier and the Show trait
will move to the now-open {:?} specifier.

Rendered

Today's `Show` trait will be tasked with the purpose of providing the ability to
inspect the representation of implementors of the trait. A new trait, `String`,
will be introduced to the `std::fmt` module to in order to represent data that
can essentially be serialized to a string, typically representing the precise
internal state of the implementor.

The `String` trait will take over the `{}` format specifier and the `Show` trait
will move to the now-open `{:?}` specifier.
@alexcrichton
Copy link
Member Author

cc @nick29581, @wycats, @seanmonstar, @aturon, I believe this is what we discussed during the work week.

@wycats
Copy link
Contributor

wycats commented Dec 8, 2014

Looks reasonable to me. I think we should also add that if a type implements both String and parse-from-string APIs, parsing an emitted string should likely produce a semantically equivalent value.

@alexcrichton
Copy link
Member Author

@wycats Is this what you were thinking?

assert_eq!(foo, from_str(format!("{}", foo).as_slice()).unwrap());

(see here)

@sfackler
Copy link
Member

sfackler commented Dec 8, 2014

This seems pretty reasonable in general.

I think there may be a bit of a distinction that should be made in the intended behavior of String implementations. For types that have reasonable round trips to and from strings (e.g. Duration and the ISO 8601 format) the proposal makes sense. There are other types, however, where you would want a String implementation but making it fully describe the type is undesirable or maybe even impossible.

For example, rust-postgres's DbError type contains a ton of information, but the current implementation of Show (that would become an implementation of String) only prints <severity>: <message>. That format's descriptive enough to generally tell you what's going on, and printing everything would be totally overwhelming, especially when fields like routine and file are only useful if you're trying to dig into some bizarre Postgres bug.

Thoughts? It seems like the intent for String implementations is "something reasonable", which for types with FromStr impls is almost certainly round-trippable output but for other types may be less descriptive.

@alexcrichton
Copy link
Member Author

That is a good point @sfackler! I added a drawback at the last second about how the guarantees provided by String may be a bit too strict and I'm worried about it motivating a new trait on the side in the future. I'd be fine wording it like:

An implementation of the String trait is an assertion that the type can be faithfully represented as a UTF-8 string at all times. If the type can be reconstructed from a string, then the following relation must be true:

assert_eq!(foo, from_str(format!("{}", foo).as_slice()).unwrap());

If the type cannot necessarily be reconstructed from a string, then the output may be less descriptive than the type can provide, but it is guaranteed to be human readable for all users.

@wycats
Copy link
Contributor

wycats commented Dec 8, 2014

I didn't mean to suggest that round-tripping would be mandated; just that if a from-string implementation existed, and String was implemented, it should roundtrip.

@huonw
Copy link
Member

huonw commented Dec 8, 2014

I vaguely recall a suggestion that we would implement Show for u8 etc as 1u8; requiring round-tripping would mean that from_str needs to handle the suffixes, or that value would just be printed as 1.

(Just something to consider.)

@seanmonstar
Copy link
Contributor

I specifically would never mention round-tripping, because it causes confusion. If you want round-tripping, use the serialize crate. I would never suggesting that the fmt::Show output should be used to save an object, and later recreate it.

@huonw yes, fmt::Show should have 3u8 output as 3u8, and the String trait would output 3.

fwiw, the RFC I wrote about this on Friday: https://gist.github.com/seanmonstar/94208ff568d1b9f71914

@alexcrichton
Copy link
Member Author

@huonw yes as @seanmonstar mentioned the String impl for u8 wouldn't append a suffix, whereas the Show impl would. I tried to call this out in the RFC, but do you think I should place more emphasis on it?

@seanmonstar I do think it's useful to have a distinction where if FromStr is implemented then it should be equivalent, otherwise it needs to be an accurate representation as much as possible. I'll add some specific wording about how types should specifically never be constructed from the Show output, which I think is a good idea!

@ftxqxd
Copy link
Contributor

ftxqxd commented Dec 8, 2014

The name String seems like it would be too easily confused with std::string::String. Perhaps something like Fmt would be more appropriate? A trait named String sounds like it represents any string type (&str, String, a rope, etc.) rather than anything that can be represented as a human-friendly string. Also, I think that the required method in either the String or Show trait should perhaps be renamed to avoid name conflicts.

I wonder if it would be useful to enforce an invariant that anything implementing Show should be evaluable as a Rust expression with the same value (perhaps excepting lifetime/ownership issues). That might make output for String a bit too noisy, and things like HashMap don’t have any literal-like construction method (other than perhaps vec![(key, val), ...].collect()), so maybe that’s not practical. (Incidentally, Python recommends that eval(repr(foo)) == foo, which is the same idea.)

@seanmonstar
Copy link
Contributor

@alexcrichton I use FromStr for basic parsing in hyper. For example, from_str::<Mime>("text/html") == Some(Mime(Text, Html, vec![]). I would implement fmt::String to output text/html, and I would use #[deriving(Show), which would mean it would output Mime(Text, Plain, []).

@seanmonstar
Copy link
Contributor

@P1start it's easier to see if you view it as fmt::String. Also, while the Python docs claim that, the Python community all agree that it is a docs bug.

@reem
Copy link

reem commented Dec 8, 2014

I agree with @seanmonstar that we should not use either of these traits for round-trip parsing (or at least we shouldn't grow to expect that behavior) - you should use a more principled system like encodable/decodable.

@alexcrichton
Copy link
Member Author

@P1start RFC 356 has outlined that module prefixes should not be included in type names while also emphasizing that the name of a type or trait should be scoped to the module that it is defined in. For example the IoError type will be renamed to io::Error and it can be locally renamed. This is distinct from the Error trait, however, in the std::error module. For this reason I do not think that string::String and fmt::String will be confused all that often in practice because this will already be a convention in the standard library. Can you elaborate more on the renaming of the methods? The methods align with the current design of the fmt module and its formatting traits.

I also have added some specific text that the Show output can never be used to reconstruct the type. For example a Path can never be constructed from UTF-8 output.


@seanmonstar I'm not sure I understand your example, that sounds like it fits into the design outlined in this RFC?


@reem note that these traits are not necessarily connected to round-trip parsing at all. The Show trait is explicitly stated as not being used for this purpose, and String is only used for this if FromStr is implemented. I believe it would be quite odd if a FromStr impl were unable to parse the default output of the type represented as a string.

@ftxqxd
Copy link
Contributor

ftxqxd commented Dec 8, 2014

@alexcrichton Quite often I do something like this in a Show impl:

struct Foo {
    x: Vec<int>,
    y: Vec<f64>,
}

impl fmt::Show for Foo {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        self.x.fmt(f);
        self.y.fmt(f);
    }
}

With the name fmt being overloaded for both String and Show, I don’t think that would compile, and would require UFCS or some other way of specifying which trait to use. It’s only a mild inconvenience, but it would still be nice for Show to perhaps have its method called fmt_repr or something. I see now that all the traits in std::fmt today use fmt anyway (not just Show), but my concern is that two traits that are both commonly used will have conflicting names. (Show is the only trait in std::fmt that is commonly used today, so the name conflict isn’t a problem.)

Regarding String’s naming, I can’t quite find where I read it, but I recall a guideline that trait names should preferably be verbs, then nouns, and at worst adjectives (or something like that) that describe what the trait provides/requires. String is not only a noun, but also to me doesn’t sound like a trait for ‘something that can be represented as a (user-friendly) string’, but rather a trait for ‘something that is a string’ (much like how the old collections traits were things like Deque for anything that is a deque). Something like Stringify I suppose would be ideal, but it sounds a little silly. Even in context, fmt::String sounds like a formatting trait for strings only (like how fmt::Pointer is only implemented for pointers). I understand your point about confusion with string::String, though—I now realise that such a confusion probably won’t happen (except perhaps in rustdoc, although that problem applies to many things and is a separate issue).

@SimonSapin
Copy link
Contributor

An implementation of the String trait is an assertion that the type can be faithfully represented as a UTF-8 string at all times.

The output format will not change from what it is today (no extra escaping or debugging will occur).

You can’t have both. If &str is not quoted and escaped, it takes all of the output space and there is no way to tell where it ends when it is part of a larger type. For example, ["a", "b"] and ["a, b"] would print to the same, and so &[&str] can not be round-tripped.

I was confused when first reading the RFC: I expected the round-trippable trait to be the debugging one, both separate from the "default" one.

@SimonSapin
Copy link
Contributor

This RFC establishes a convention that Show and String produce valid UTF-8 data, but no static guarantee of this requirement is provided. Statically guaranteeing this invariant would likely involve adding some form of TextWriter which we are currently not willing to stabilize for the 1.0 release.

I find this very unfortunate, since that static guarantee can likely not be added backward-compatibly. Can you expand on why you’re not willing to stabilize TextWriter for 1.0? A minimal variation of it only needs fn write_str(&mut self, data: &str) -> Result<(), ()>, more general-purpose Writer-like functionality can be added later.

@alexcrichton
Copy link
Member Author

@P1start that should compile today and use the Show impl for the member types. Disambiguation will require UFCS, yes. The naming for this trait should be thought of as fmt::String which translates to "format string" which is generally what the trait is performing. In general though, it follows the current design of the std::fmt traits.

@SimonSapin that is correct about string slices, but &[&str] does not implement FromStr for precisely this reason. We can't attach round-tripping to the debugging trait because it's defined for all Rust types as not all types can be represented (examples given in the RFC).

I also still believe that we cannot stabilize TextWriter for 1.0 at this time. We're cutting it real thin stabilizing std::io as is at this time. We can of course add a simple trait with one write_str method, but it's not quite as simple as we need to consider the ramifications of how TextWriter and Writer interact. For example do we now need to add BufferedTextWriter and BufferedWriter? For most I/O objects that are generic over the inner type, do we need to add duplicate types for all of these? Questions like this are why I'm not confident that we can deliver a quality experience with a TextWriter trait.

@seanmonstar
Copy link
Contributor

Ah, I was confused just as Simon was: I misread and thought fmt::Show <=> FromStr.

Still, I think the round-trip promises should be left to the serialize crate. Decodable <=> Encodable.

@seanmonstar
Copy link
Contributor

As for @P1start concerns, if you imported both into the same module, you would need to use UFCS to disambiguate. See http://is.gd/gjCmby

Instead, I'd need to write LowerHex::fmt(self.0, f) or Show::fmt(self.0, f).

@quantheory
Copy link
Contributor

I'm also just slightly against the String name:

  • I'm still not sure I actually agree with the RFC 356 reasoning about prelude names in general. I think that we can get away with io::Result not being too confusing because it actually is a partially specialized result::Result (which I think is also true of all the Results in other std modules?). string::String and fmt::String don't have that close relationship.
  • Even assuming that string::String and fmt::String don't get mixed up too much in the code itself (though I think this will trip up beginners), it opens the door to more confusion in conversation and documentation. As a special case of this, the docs might get a bit stranger in places where both Strings are being mentioned (though maybe the real issue is that rustdoc should not leave off the fmt:: or string:: where there could be ambiguity).

@SimonSapin
Copy link
Contributor

@alexcrichton

I also still believe that we cannot stabilize TextWriter for 1.0 at this time. We're cutting it real thin stabilizing std::io as is at this time.

Did that decision about std::io explicitly include consciously giving up forever on a static UTF-8 guarantee for std::fmt? I don’t think deciding to freeze a module means you can absolutely not change it when new information comes up. I’m willing to do the work to make this happen (though I understand that someone else still need to take time to review it.)

We can of course add a simple trait with one write_str method, but it's not quite as simple as we need to consider the ramifications of how TextWriter and Writer interact. For example do we now need to add BufferedTextWriter and BufferedWriter? For most I/O objects that are generic over the inner type, do we need to add duplicate types for all of these?

All interesting questions, that I think can be resolved backward-compatibly after 1.0.

By the way, from the RFC:

The formatting traits today largely provide clear guidance to what they are intended for. For example the Binary trait is intended for printing the binary representation of a data type.

It’s not clear to me at all what the Binary trait is for. Its documentation is a bit tautological, it only says “Format trait for the b character”. Is it for writing raw, non-UTF-8, binary bytes?

@sfackler
Copy link
Member

sfackler commented Dec 8, 2014

It’s not clear to me at all what the Binary trait is for. Its documentation is a bit tautological, it only says “Format trait for the b character”. Is it for writing raw, non-UTF-8, binary bytes?

No, it prints numbers in binary, just like Hex prints them in hexadecimal.

@wycats
Copy link
Contributor

wycats commented Dec 8, 2014

I actually got my traits mixed up ;)

For what it's worth, what I was trying to get across actually wasn't "round tripping", but rather the idea that unlike fmt::String, which may be lossy (for example, format!("{}", path)), the new Show trait should try to include enough semantically important aspects of the value so that a human can re-create it.

An example of a possible problem for Show would be serializing a Vec as [ foo, bar, baz ], because it would be ambiguous between [ "foo", "bar", "baz" ] and something like [ "foo, bar", "baz" ].

Essentially, what I'm trying to say is that Show should try to let a human understand enough of the representation to understand its public semantics, but not necessarily the internal guts of the value.

@seanmonstar
Copy link
Contributor

That makes sense. fmt::Show for a String perhaps would be String("foo bar"), not String { vec: *mut Vec }

@alexcrichton
Copy link
Member Author

@SimonSapin

Did that decision about std::io explicitly include consciously giving up forever on a static UTF-8 guarantee for std::fmt?

Yes. When @aturon and I were discussing the traits the questions we raised (some I pasted above) were why we avoided attempting to add a TextWriter at this time.

All interesting questions, that I think can be resolved backward-compatibly after 1.0.

Maybe, but I'm not personally convinced that they will have answers in a reasonable time frame (e.g. taking into account possible language features coming down the pike).


@seanmonstar, @wycats I intend for the following impl to exist:

impl fmt::String for String {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { self.as_slice().fmt(f) }
}

impl fmt::String for str {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { f.write(self.as_bytes()) }
}

impl fmt::Show for String {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { self.as_slice().fmt(f) }
}

impl fmt::Show for str {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        try!(write!(f, "\""));
        for c in self.chars().flat_map(|c| c.escape_default()) {
            try!(write!(f, "{}", c));
        }
        write!(f, "\"") 
    }
}

Does that help clear up what String and Show are intended for?

@barosl
Copy link
Contributor

barosl commented Dec 9, 2014

Personally I think Inspect is less confusing than Show for newcomers, but I also agree that the churn required to change the name does not worth it.

So, in a Ruby sense, this proposal suggests Show for .inspect, String for .to_s, but not .to_str because it should be handled through the serialize crate, as Show is not guaranteed to be lossless (or round-trippable). Am I right?

@aturon
Copy link
Member

aturon commented Dec 10, 2014

Note: this may be a good time to reconsider rust-lang/rust#16544

@SimonSapin
Copy link
Contributor

@aturon That sounds familiar… #46 :)

@aturon
Copy link
Member

aturon commented Dec 11, 2014

@SimonSapin Wow, indeed! Looks like you posted that just before I started on Rust :-)

@chris-morgan
Copy link
Member

Demonstrating my Python background (where these things are the magic methods __str__ and__repr__), I’d rather like String and Repr as the names for these things. Show is rather a vague name. That would also to me suggest {:r} for raw rather than {:?} (in Python it’s {!r}, and Python uses ! where we use :), though either will do.

I also feel that the semantics of these traits needs to be tightened.

@alexcrichton
Copy link
Member Author

@aturon That's a good point, I had forgotten about that issue! I think we may want to focus this RFC on the semantic meanings of Show and String rather than the implementation details though, I suspect we can have a separate RFC (if necessary) for size hints.

@chris-morgan could you elaborate on how you think the semantics need to be tightened? Do you feel that the Show and String definitions here are still ambiguous?

@Ericson2314
Copy link
Contributor

+1, Show has similar problems in Haskell, where I first became familiar with those problems. Because Show for debugging purposes was more likely to be derived, I thought of the debugging usage as the "more isomorphic" one. But based on the other comments, and the fact that deriving a String isomorphism is certainly less possible in Rust than it would be in Haskell, I agree that only (de)serializable should make a such a claim.

I like Repr for Show, and I like making Show part of the default bound, though as I mentioned in another thread I think making the list of traits entirely customizable would be neat. I also wonder if the deriving should be opt-out, or even mandatory. My only fear to making Show mandatory is it does subvert parametricity, and evil or misguided souls could exploit that.

@seanmonstar
Copy link
Contributor

I also wonder if the deriving should be opt-out, or even mandatory. My
only fear to making String mandatory is it does subvert parametricity, and
evil or misguided souls could exploit that.

So, String couldn't be mandatory, because not all types have semantic
string values. But Show could. Maybe it can be added to the lint that
notices when a type forgets to optin to Copy?

@Ericson2314
Copy link
Contributor

Ah, I meant Show, edited for posterity. Yeah that names confuses me because "Show" sounds, well theatrical and thus user-facing.

@seanmonstar
Copy link
Contributor

@alexcrichton what do you think of including a lint that checks all things implement Show (or allow the lint), similar to Copy? It'd greatly help deriving Show on new structs.

@sfackler
Copy link
Member

A lint seems like a good idea to me.

@alexcrichton
Copy link
Member Author

I'd be ok with adding a lint so long as it was allow-by-default. I personally also believe that the Copy lint should be allow-by-default as it's too noisy as warn-by-default.

@SimonSapin
Copy link
Contributor

allow-by-default lints are much less useful for discoverability: by definition you have to already know about them to see them.

Would it make sense to have another level between allow and warn, where a given lint will print at most one message per crate? Something like "Some types don’t implement Copy. Add #![warn(missing_copy_impl)] to your crate for more details.". Maybe including the name of the offending types, but kept on one (logical) line.

@sfackler
Copy link
Member

I'm more okay with a Show lint being warn-by-default than the Copy lint, since there is really no reason for any public type to not implement it, while there are valid reasons for a public type to not opt in to Copy.

@alexcrichton
Copy link
Member Author

I've pushed a commit which tones down the wording about "round tripping" through String and FromStr. It is now recommended, but not required for this property to hold.

@aturon
Copy link
Member

aturon commented Dec 19, 2014

This RFC has been accepted.

Thank @alexcrichton for writing this RFC (and @seanmonstar for helping kick it off). It's great to finally resolve this thorny question!

There's broad support for the proposed design here, especially now with the toned-down guideline about roundtripping. We'll leave open the question of size hints for formatters (which we intend to address).

Tracking issue

@aturon aturon merged commit fb4f25c into rust-lang:master Dec 19, 2014
sampsyo added a commit to sampsyo/fake1 that referenced this pull request Jan 13, 2015
@Centril Centril added the A-fmt Proposals relating to std::fmt and formatting macros. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Proposals relating to std::fmt and formatting macros.
Projects
None yet
Development

Successfully merging this pull request may close these issues.