Changes in Ascii #14496

aochagavia · 2014-05-28T19:43:32Z

Breaking changes:

Rename to_byte to as_byte
Rename to_char to as_char
Deprecated eq_ignore_case (I don't think we want to have a special function for this when it is so easy to write a.to_lowercase() == b.to_lowercase()).

Other changes (functions copied from char):

Added to_digit function
Added is_digit_radix function
Added tests for these functions

Questions (I didn't know where to ask, so I write them here)

Should we derive Copy for Ascii? It sounds reasonable since it is just a u8.
Should we remove to_ascii (which fails if the character is not ascii) and replace it by to_ascii_opt?
Many of the functions available for Ascii are also available for char. Maybe we should unify them in a trait. At this moment we have the Char trait, but it has some functions specially meant for UTF8 like is_XID_start and is_XID_continue. Maybe we should split this trait in two: Char and Utf8Char.
I think there is some sort of chaos regarding strings of Ascii characters (you have some traits for slices of &[Ascii] and Vec<Ascii> which offer conversion to normal strings or bytes). I would like to implement an AsciiString to bring order to this. Should I just do it and open a PR?

[breaking-change]

lilyball · 2014-05-28T19:49:16Z

Note that I already have an outstanding PR #14436 that makes changes to Ascii.

lilyball · 2014-05-28T19:57:24Z

Rename to_byte to as_byte

Rename to_char to as_char

.as_* methods are generally used when converting types without changing the in-memory representation. For example, String.as_bytes() gives a &[u8] that refers to the underlying memory of the String. Whereas .to_* methods are used for conversions that may allocate and/or copy data.

It's not so obvious what's correct when talking about these small primitive values, as all conversion is cheap, but .to_* is still technically the correct naming for .to_char(), as it does copy data to a new memory representation (even if it doesn't allocate data).

For .to_byte() it's less clear, because Ascii actually is a u8 under the hood, but that's really an implementation detail. There have been proposals to allow integral types of all widths instead of just 8/16/32/64, and if we ever do that Ascii would be a u7 instead. But ignoring that for a sec, arguments can be made for both .to_byte() and .as_byte(), and since there's no clear winner, and we already have .to_byte() (and it's consistent with .to_char()) my recommendation is to avoid the breaking change and just leave it as .to_byte().

lilyball · 2014-05-28T19:58:05Z

Added to_digit function

Added is_digit_radix function

Why? I can say a.to_char().to_digit(10) already.

aochagavia · 2014-05-28T20:02:34Z

@kballard I thought it would be interesting to have to_digit and is_digit_radix in as functions of Ascii. However, I see now that it is not a real benefit (there is no performance gain).

What you say about as_byte and as_char is true. The naming is debatable and we dont need to change it.

What do you think about the questions at the end?

lilyball · 2014-05-28T20:02:48Z

Should we derive Copy for Ascii? It sounds reasonable since it is just a u8.

We don't have opt-in built-in bounds yet. That's still pending See #13231. Today Ascii is already Copy.

Many of the functions available for Ascii are also available for char. Maybe we should unify them in a trait.

There apparently already is one. core::char::Char. Ascii should just implement that.

aochagavia · 2014-05-28T20:05:15Z

@kballard Thanks for your feedback!

The problem with the current Char trait is that it has some functions which don't make sense for Ascii. That is why I speak of splitting it into two traits.

lilyball · 2014-05-28T20:06:14Z

I would like to implement an AsciiString to bring order to this.

What do you mean? std::ascii::AsciiStr already exists and is implemented on &[Ascii]. If you have Vec<Ascii> and want to use these methods, you can say v.as_slice().to_lower(). Post-DST, Vec<T> will almost certainly implement Deref<[T]> which means that Vec<Ascii> will then be able to call all of the &[Ascii] methods without .as_slice().

lilyball · 2014-05-28T20:09:51Z

@aochagavia The only reason to use a trait instead of just implementing the same methods is if you want to be generic over the trait. Char exists right now just because you can't impl methods on char directly. Personally, I don't see that there's a need for a trait to unify this, because I don't think it's very likely that you'll need a function that's generic over both char and Ascii using just these functions, but if you do want to use a trait, then we may as well use the one we already have.

Looking at it, the only methods that I'm guessing you don't think make sense are len_utf8_bytes(), encode_utf8(), and encode_utf16() (note that the two escape methods actually do make sense for Ascii). But if you're going to be generic over a trait, being able to encode the value as UTF-8/UTF-16 actually seems like a reasonable thing to want, so I don't see the problem with providing this functionality for Ascii.

aochagavia · 2014-05-28T20:11:06Z

Actually my concern is naming. If you implement the same trait, then you can be sure that naming will remain consistent.

aochagavia · 2014-05-28T20:14:36Z

Also, when I wrote about AsciiString I was actually thinking of a struct (similar to the way in which String is implemented). I think that would be more clear than implementing traits on &[Ascii].

Thanks again for your feedback!

lilyball · 2014-05-28T20:16:33Z

Actually my concern is naming. If you implement the same trait, then you can be sure that naming will remain consistent.

Ok, I suppose that's a good reason. Given that, I guess we can split char::Char into two traits, one for e.g. CharType containing all the is_* methods (minus is_digit_radix, and we should add is_hex which Ascii currently has), and then leave the rest in the current Char.

lilyball · 2014-05-28T20:18:07Z

@aochagavia What would this proposed AsciiString struct do? Reimplement all of String methods on top of Vec<Ascii> instead of Vec<u8>? That seems complicated, hard to keep in sync with String (and no, putting all String methods in a trait to enforce naming is not a good idea), and just largely doesn't seem very useful compared to using Vec<Ascii> directly.

lilyball · 2014-05-28T20:19:42Z

Should we remove to_ascii (which fails if the character is not ascii) and replace it by to_ascii_opt?

We probably should. There's precedence in the other various primitive conversion methods (e.g. ToPrimitive, char::from_u32(), etc) for only exposing an Option variant.

lilyball · 2014-05-28T20:22:04Z

Because that's not really a character type classification. But I don't really have a compelling reason for omitting it, besides the fact that it doesn't correspond with any of the normal character classes (such as alpha, digit, punct, etc).

aochagavia · 2014-05-28T20:22:21Z

Sorry, I deleted my comment before you gave answer.

I think I will open a new pull request... Thanks!

lilyball · 2014-05-28T20:23:18Z

@aochagavia Have you looked at my #14436? I'm concerned that you submitting a separate PR will conflict. Perhaps we should build your proposed changes on top of my PR instead of competing.

aochagavia · 2014-05-28T20:24:00Z

I think it is a good idea, but I have never done it before. Can I just push commits to your pull request?

lilyball · 2014-05-28T20:36:41Z

@aochagavia You can't, but you can fetch my branch from my fork, add commits to it, push that to your own fork, and send me a link and I can update my PR.

aochagavia · 2014-05-28T21:27:19Z

I think I will wait until your pr is merged. Alternatively you can
implement the changes I mentioned if you want.

Thanks again for your help with this!
On May 28, 2014 10:36 PM, "Kevin Ballard" notifications@github.com wrote:

@aochagavia https://github.com/aochagavia You can't, but you can fetch
my branch from my repo, add commits to it, and send me a link and I can
update my PR.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/14496#issuecomment-44461046
.

lilyball · 2014-05-28T21:39:45Z

@aochagavia I'm on board with implementing the changes myself. Just to clarify, this is what you want to do:

Split char::Char into two traits, one of them being char::CharType (or maybe CharClassify?)
Implement CharType (or CharClassify) on Ascii
Deprecate Ascii.eq_ignore_case

Anything else that I'm missing? I think we should probably also go ahead and remove .to_ascii() in favor of .to_ascii_opt(), as you had suggested.

Also, opinion on is_digit_radix()? Does it make sense to put it in CharType, because it's potentially useful, or leave it out because it doesn't correspond to one of the common character classes?

aochagavia · 2014-05-29T09:01:39Z

@kballard No problem, go ahead!

Actually I would like the Char trait to be implemented by both char and Ascii, and a new Utf8Char trait implemented only by char. However, I guess your suggestion of a CharType trait will be good as well. My only concern here is naming, because CharType seems redundant and CharClassify doesn't convince me (the functions you implement are not only used to classify chars).

Additionaly, I think it would be good to have is_digit_radix(). If you have it, then you can define is_digit() and is_hex() in terms of it. This way you would only need to implement one function, and you would get the other two for free.

Another concern I have is the fact that is_digit_radix() will fail if the radix is greater than 36. The problem here is that it could be confusing to return None because it would look like the character is not a digit, while the real problem is that the radix is invalid. IMO returning a Result would complicate things too much, so I think that we should keep failing (I can't see a better alternative). If you can find any better solution for this, please add it to your pr.

lilyball · 2014-05-30T01:51:25Z

Failing when the radix is greater than 36 is deliberate, because that's considered a programmer error, not an input error.

I'm still not convinced that is_digit_radix() really makes sense here, but I suppose I'll add it for convenience.

aochagavia added 3 commits May 28, 2014 17:16

Add, rename and deprecate functions

f96981d

Add tests

9d594af

Updated tests using deprecated functions

1ef5144

Removed trailing whitespace and fixed errors

51b0ca1

aochagavia closed this May 28, 2014

aochagavia deleted the pr branch May 29, 2014 08:34

Changes in Ascii #14496

Changes in Ascii #14496

Uh oh!

Conversation

aochagavia commented May 28, 2014

Breaking changes:

Other changes (functions copied from char):

Questions (I didn't know where to ask, so I write them here)

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 28, 2014

Uh oh!

lilyball commented May 28, 2014

Uh oh!

aochagavia commented May 29, 2014

Uh oh!

lilyball commented May 30, 2014

Uh oh!

Uh oh!