Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes parsing bug for floats with non english culture #794

Conversation

jhm-ciberman
Copy link

Summary

This PR fixes a serious bug that caused an incorrect parsing of floats and doubles in machines with non english cultures.

The bug was reproduced with Windows 11 setting the culture to Spanish (Argentina). If the culture is English (United States) the parsing works fine.

This PR fix a bug related to #792. For a full explanation refer to that issue.

Fix

The fix adds YamlFormatter.NumberFormat format provider in Parse methods in ScalarNodeDeserializer.AttemptUnknownTypeDeserialization.

  • Without this fix, the tests work with English (United states) but failed in Spanish (Argentina).
  • With this fix, all the tests work fine with both English (United states) culture and Spanish (Argentina) culture.

Extra comments

Question: Why does this part of the code does not use TryParse? Using Parse and the TryAndSwallow method causes unnecessary allocations for the exceptions. I think it would be an easy change.

Adds `YamlFormatter.NumberFormat` in `Parse` methods in
`ScalarNodeDeserializer.AttemptUnknownTypeDeserialization`.
This caused tests to fail in non english culture (Tested with "es-AR")
@jhm-ciberman jhm-ciberman changed the title Fixes #792 parsing bug for floats with non english culture Fixes parsing bug for floats with non english culture Mar 30, 2023
@EdwardCooke
Copy link
Collaborator

The reason it doesn’t use tryparse is because it is not available in older frameworks like 3.5 which we currently support. I’m not entirely sure when the tryparse methods were added, but there may be other framework versions we compile against that don’t have it either.

I need to do some personal checks with this pr before I can merge it in. My biggest concern is if people are using this library on a culture other than English and they are expecting that to work in the future then we change the culture and suddenly it no longer works with their yaml. This could be a pretty impactful change so I’m being very cautious with merging it in.

@jhm-ciberman
Copy link
Author

jhm-ciberman commented Apr 1, 2023

The reason it doesn’t use tryparse is because it is not available in older frameworks like 3.5 which we currently support. I’m not entirely sure when the tryparse methods were added, but there may be other framework versions we compile against that don’t have it either.

Oh, I understand. That's totally fine. Also there are a lot of allocations in other parts of the parser so a few extra allocations don't make any difference. In a future maybe it's possible to wrap the TryParse with an #if NET > blabla directive.

I need to do some personal checks with this pr before I can merge it in. My biggest concern is if people are using this library on a culture other than English and they are expecting that to work in the future then we change the culture and suddenly it no longer works with their yaml. This could be a pretty impactful change so I’m being very cautious with merging it in.

That's a very good observation. Although I doubt someone is relying in that behaviour because it's super specific to specific cultures that use comma instead of dot for decimal separator, but it's worth it to investigate before merging.

@lahma
Copy link
Contributor

lahma commented Apr 1, 2023

The reason it doesn’t use tryparse is because it is not available in older frameworks like 3.5 which we currently support. I’m not entirely sure when the tryparse methods were added, but there may be other framework versions we compile against that don’t have it either.

A bit out of topic, but having such a wide set like:

<TargetFrameworks>net70;netstandard2.0;netstandard2.1;net35;net40;net45;net47;net60</TargetFrameworks>

Might just cause unnecessary maintenance burden. People left maintaining NET 3.5 era software aren't supposed to do library upgrades anyway trying to keep the program running (just too unsafe). I believe many OSS projects have moved to supporting the oldest full framework that Microsoft supports, which is net462 I believe. Might be easier just to cut the cord and let legacy development use older versions of the library and publish major release dropping old targets.

Just my two cents and thank you all who are maintaining and improving this library!

@jhm-ciberman
Copy link
Author

According to the compatibility table at the bottom of the dotnet documentation, the TryParse was added back in Net Framework 2.0. Altough I'm not sure if the compatibility table says really the truth.

image

@EdwardCooke
Copy link
Collaborator

I modified the targets to support only supported .net versions last year. 3.5 was still on that list and it made me sad. I do agree on wishing it wasn’t and removing it. I’ve thought about cutting ties to that and 4.0 and potentially 4.5. Those last 2 are there for unity support if I recall.

@lahma
Copy link
Contributor

lahma commented Apr 2, 2023

I don't understand much about Unity as I don't develop for it (but Jint does support it, and a lot of people use Jint with Unity), it seems that Unity's .NET profile support basically states that netstandard2.0 and probably even netstandard2.1 is fine nowadays. Jint has <TargetFrameworks>net462;netstandard2.0;netstandard2.1;net6.0</TargetFrameworks>.

@EdwardCooke
Copy link
Collaborator

So I just looked at this, I'm a bit nervous about forcing a specific culture/format in those parse methods instead of using the local machine culture like it currently does. I don't think that it is the best idea to do that.

The fix I would prefer is to set the locale in the test project. Which is actually easy to do.

Add a .runsettings file to the test project directory. The contents specify the environment variables to set the locale.
You then add a property to the csproj.

.runsettings

<?xml version="1.0" encoding="utf-8"?>
<!-- File name extension must be .runsettings -->
<RunSettings>
  <RunConfiguration>
    <EnvironmentVariables>
      <!-- List of environment variables we want to set-->
      <LANG>>en_US.UTF-8</LANG>
      <LANGUAGE>en_US:en</LANGUAGE>
      <LC_ALL>en_US.UTF-8</LC_ALL>
    </EnvironmentVariables>
  </RunConfiguration>
</RunSettings>

csproj property:

<RunSettingsFilePath>$(MSBuildProjectDirectory)\.runsettings</RunSettingsFilePath>

I tested this fix in a docker image set to English/Denmark (which uses a , instead of a . for decimal separator like yours) It was also failing on the same tests that you had. The problem was resolved when specifying the locale in the runsettings file.

Dockerfile:

FROM mcr.microsoft.com/dotnet/sdk:7.0
RUN apt update
RUN apt install -y vim
RUN apt install -y locales
RUN sed -i '/en_DK.UTF-8/s/^# //g' /etc/locale.gen && \
    locale-gen
RUN useradd edward -d /source -m -u 1000 -U
ENV LANG en_DK.UTF-8  
ENV LANGUAGE en_DK:en  
ENV LC_ALL en_DK.UTF-8  

USER edward
WORKDIR /source

@jhm-ciberman
Copy link
Author

jhm-ciberman commented Apr 3, 2023

I understand that you are afraid of breaking existing apps that upgrade. But in the other hand, let me explain my use case:
I'm designing a mod loading system for a game. The mods are defined in yaml and loaded in the machine that is running the game. So I don't really know the locale of the machine before loading the mod and I'm extremely concerned that loading in one machine will produce the correct result with floats and loading in another machine with another culture will break the game. Mods will be authored by people that are not necessarily programmers and can easily make mistakes like using comma instead of dot.

I think the best option would be to follow the yaml specs respect to the parsing of float point numbers and bump the major number in the semver to signify a major breaking change (although it shouldn't be a breaking change if you are following the yaml specs).

It's just my point of view.

Btw: very smart idea to create a custom docker image for that. That didn't cross my mind.

@SchreinerK
Copy link

SchreinerK commented Apr 5, 2023

IMHO a YAML document should be always use the invariant culture.
double.Parse(s, NumberStyles.Any, CultureInfo.InvariantCulture);
number.ToString(CultureInfo.InvariantCulture);
or TryParse, it does not matter.
If different cultures in a document are to be supported, this would have to be communicated to the serializer/deserializer and/or marked in the document.
new DeserializerBuilder().WithCulture(new CultureInfo("de-DE"))
Best regards
Kux

@EdwardCooke
Copy link
Collaborator

I’ve been thinking about this over the past couple of days. A solution that would be pretty good is to be able to specify the culture when creating the serializer and deserializer. Defaulting to the yaml spec format. That way if people need it to be the current machine culture or whatever other one. How does that sound?

@jhm-ciberman
Copy link
Author

@EdwardCooke That looks fine to me. Although I can't imagine a single use case of someone that wants to encode floats in a different encoding than the YAML spec. I wouldn't overcomplicate things and I would only support the YAML spec (culture independent parsing).
But if a setting to configure the culture makes you more confident when releasing a new version of YamlDotNet, then that works for me.

@EdwardCooke
Copy link
Collaborator

Were you going to make the requested change?

@jhm-ciberman
Copy link
Author

Sorry, I was awaiting for your answer before doing anything. How would you like to approach this?
Do you want me to proceed with the changes described in your last comment? I haven't touched the source code for YamlDotNet since April, but I can do it If you need help. :)

@EdwardCooke
Copy link
Collaborator

Follow my last comment, that would be great.

@EdwardCooke
Copy link
Collaborator

Any progress on this? I can try and work on it over the next week or 2.

@jhm-ciberman
Copy link
Author

jhm-ciberman commented Aug 22, 2023 via email

@EdwardCooke
Copy link
Collaborator

No worries, I'll get it fixed. I'm going to close this PR, thanks for your attempt.

@jhm-ciberman
Copy link
Author

Sorry. I really love this project and definitely will make more PRs in the future (maybe I can help with the Wiki or docs). But right now is not the best moment

@EdwardCooke
Copy link
Collaborator

Not a problem. Health always takes priority!

@filzrev
Copy link
Contributor

filzrev commented Jul 16, 2024

It seems .NET 3.5 support is dropped from v15.1.0 release
Is there any reason that Remaining can not use TryParse methods?

@EdwardCooke
Copy link
Collaborator

Nope. Sure isn’t.

@filzrev
Copy link
Contributor

filzrev commented Jul 17, 2024

@EdwardCooke
Thanks for your response.
I thought it's worth to use TryParse and make sure not to throw exceptions.

It's because of following reasons.

  • A lot of OverflowException are recorded internally on Debugging.
  • When using WithAttemptingUnquotedStringTypeDeserialization it runs 10x slower on microbenchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants