Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect culture-specific formatting for Vector64/128/256 types #11604

Closed
jkotas opened this issue Dec 4, 2018 · 12 comments · Fixed by dotnet/coreclr#25343
Closed

Incorrect culture-specific formatting for Vector64/128/256 types #11604

jkotas opened this issue Dec 4, 2018 · 12 comments · Fixed by dotnet/coreclr#25343
Labels
area-System.Runtime bug help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Dec 4, 2018

From https://github.com/tannergooding/coreclr/commit/211d963a42c8988770afa4d2edcbe9be0ed0b8a8#r31551940:

@tannergooding I do not think that the use of NumberGroupSeparator is correct. NumberGroupSeparator is the string that is used to separate thousands or similar quantities (like 1,000,000,000 in en-US culture). For example, it is . in number of cultures - that would make the formatting here pretty nonsensical.

I think it is dubious for these vector types to implement IFormattable. I think they should just implement plain vanilla ToString, and do not bother with IFormattable. IFormattable makes sense to types that are meant to be displayed to the user directly. It is not the case here.

@tannergooding
Copy link
Member

tannergooding commented Dec 4, 2018

I do not think that the use of NumberGroupSeparator is correct. NumberGroupSeparator is the string that is used to separate thousands or similar quantities (like 1,000,000,000 in en-US culture). For example, it is . in number of cultures - that would make the formatting here pretty nonsensical.

I only used NumberGroupSeparator as that is what S.N.Vector<T> is currently using (which is the closest existing API that parities the new ones).
Do we have a better option, perhaps hardcoding it to something like ; (semicolon) instead? -- If we fix one, we should consider fixing the other as well.

I think it is dubious for these vector types to implement IFormattable.

This is mirroring the interfaces we have exposed on S.N.Vector<T>. While I do think that the general use-case will be diagnostics/debugging, I think there will also be cases where the user may want to display these otherwise and supporting custom format specifiers is fairly trivial.
That being said, if we did remove IFormattable (which may require a follow up API review, given that this was already reviewed/approved); I would think we would want to change ToString() to always use InvariantCulture and round-trippable output (G17 for double and G9 for float).

@jkotas
Copy link
Member Author

jkotas commented Dec 4, 2018

used NumberGroupSeparator as that is what S.N.Vector<T> is currently using

Ok, I see. That looks broken too.

Do we have a better option, perhaps hardcoding it to something like ; (semicolon) instead?

I think it would be actually fine to hardcode , (comma) since it is the typical program language formatting for list of numbers, assuming we do the other part.

If you hardcode anything and support custom formats at the same time, the combination always leads to odd behaviors. For example:

var v = Vector128<double>.Zero;
v = v.WithElement(0, 1000).WithElement(1, 1000);
Console.WriteLine(v.ToString("N0", null));    

is going to print <1,000, 1,000> today (for en-US culture).

If we fix one, we should consider fixing the other as well.

Fixing the Vector<T> may need an extra care because of it is existing type, but we should not be replicating Vector<T> bugs in new types.

there will also be cases where the user may want to display these

Folks can always do their own formatting when they need to display these. For example, the hardcoded < has a high chance of not working well if you need to display these (to non-programmers).

@jkotas jkotas changed the title Culture-specific formatting for Vector64/128/256 types Incorrect culture-specific formatting for Vector64/128/256 types Dec 4, 2018
@tannergooding
Copy link
Member

So it sounds like the current suggestion is:

  • Remove the IFormattable interface
  • Remove the ToString(string) and ToString(string, IFormatProvider) methods
  • Update ToString() to use CultureInfo.InvariantInfo
  • Update ToString() to use G17 for double and G9 for float (required so that users can differentiate values, since just G won't always allow that today).

Maybe, longer term, we need something which can logically dictate the correct symbols to use when grouping a collection of items and how the elements should be separated?
-- This seems like a decently common scenario and something we don't have a way to handle today. Users trying to do their own thing will hit the scenarios you listed above.

@tannergooding
Copy link
Member

@jkotas, seems we have an existing TextInfo.ListSeparator we can use instead of NumberGroupSeparator.

We don't, however, appear to have anything that can be used for the List Start (<) and List End (>).

@jkotas
Copy link
Member Author

jkotas commented Dec 11, 2018

appear to have anything that can be used for the List Start (<) and List End (>).

That's not that surprising. The programmer specific constructs are generally not localizable, e.g. C# language keywords are not localizable either. (BTW: I believe there was an experiment to localize Excel/VB a long time ago. It looked hilarious - it did not last long.)

longer term, we need something which can logically dictate the correct symbols to use when grouping a collection of items

I do not think so. For example, the human way to format lists in English is start the first one with upper case letter, separate the items with comma, except the last two that are separated by and.

@danmoseley
Copy link
Member

@tannergooding I'm moving to Future as it is not clear this must be fixed for 3.0. If I'm wrong let's move it back and find an owner to fix by July 1st.

@tannergooding
Copy link
Member

@danmosemsft, the fix here is pretty trivial and just involves using TextInfo.ListSeparator instead of NumberGroupSeparator.

We don't have anything for List Start and List End, but that is probably ok.

I'm going to see if someone from the community wants to pick this up; and will put up a PR otherwise.
-- It might also be good to fixup System.Numerics.Vector* to use ListSeparator at the same time, but I'm unsure of back-compat concerns here...

@tannergooding
Copy link
Member

If you hardcode anything and support custom formats at the same time, the combination always leads to odd behaviors.

Just wanting to ensure I address this as well, since it can't be changed post 3.0...

I definitely understand the concern here, but I think that it exists for basically any user-defined struct that represents more than a single value. The best option people have is to either not support globalization at all, which puts a larger burden on the consumer (if, for example, they want to support something like percentages or to display additional digits for debugging purposes), or to say there may be some combinations that render weirdly.

I think the latter ends up being more flexible long term and users always have a workaround for cases that do render weirdly (you can use custom numeric format strings to avoid commas, etc).

@jkotas
Copy link
Member Author

jkotas commented Jun 21, 2019

not support globalization at all

I think that's the right option for types that are not meant to be directly displayed in localized UI. Vector64 falls into this category. Another example of type in this category is Tuple/ValueTuple that is hardcoded to format using (, ) and , .

@tannergooding
Copy link
Member

I'm not sure I agree that these types can't or shouldn't be displayable in localized UX. Realistically they are no different from Vector<T> or Vector2/3/4 other than not exposing operators/methods directly. Users can still use them in normal code and can still use them for a software fallback case by utilizing the indexers. Due to them being the ABI primitives, there are also likely some places they will get used for interop (eventually), perf, or convenience.

That being said, we can always add IFormattable later if enough people ask for it. So I'm fine with removing the interface from the contract for now.

The ToString implementation should then hardcode itself to use <, >, ,, and to use CultureInfo.InvariantCulture when formatting each element of the vector.

@jkotas
Copy link
Member Author

jkotas commented Jun 21, 2019

That being said, we can always add IFormattable later if enough people ask for it. So I'm fine with removing the interface from the contract for now.

The ToString implementation should then hardcode itself to use <, >, ,, and to use CultureInfo.InvariantCulture when formatting each element of the vector.

Sounds great to me. Should we flip the bug back to 3.0 then?

@tannergooding
Copy link
Member

Should we flip the bug back to 3.0 then?

Done. I also found someone from the C# Discord community who is going to look at tackling this. I'll make sure it gets in for P7.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime bug help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants