Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make parsing float literals locale-independent #5928

Closed
vtjnash opened this issue Feb 24, 2014 · 19 comments · Fixed by #5988
Closed

make parsing float literals locale-independent #5928

vtjnash opened this issue Feb 24, 2014 · 19 comments · Fixed by #5988
Labels
bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@vtjnash
Copy link
Member

vtjnash commented Feb 24, 2014

In non-US locales, floating point numbers are written with a comma (e.g. "1,0" instead of "1.0"). strtod can be reconfigured to accept this alternative input, which breaks parsing

ref JuliaGraphics/Gtk.jl#69

@JeffBezanson per chat messages, suggestion is to pull this function from FreeBSD and remove the locale bits we don't want

@vtjnash vtjnash added the bug label Feb 24, 2014
@vtjnash vtjnash added this to the 0.3 milestone Feb 24, 2014
@JeffBezanson
Copy link
Member

We do call setlocale, but libraries can annoyingly override it.
We also have to decide what to do with the float parsing functions in base. I suppose they should all just do the same locale-independent thing.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 24, 2014

it's harder to say what the functions in base should do.

on the one hand, when parsing code, the author of the code expects to need to use C-style notation "1.0", esp. since "1,0" already means something different

on the other hand, GUI libraries like Gtk, are probably right to try to conform to the user's locale preference setting (I don't actually know how strongly users feel about this). the functions in base may fall into this category.

we may need two functions. parse_user_locale & parse_ignore_locale (please choose shorter names). I think the default parseint function should be parse_user_locale, whereas parse uses parse_ignore_locale (although it may call code through macros so it isn't a "safe" function). But then for web-services, we would want to ignore the local locale and try to allow using the users locale

@tknopp
Copy link
Contributor

tknopp commented Feb 24, 2014

yes I vote for making these also locale-independent. Maybe provide special versions that are locale-dependent. Or a keyword argument.
This kind of bugs is hard to find.

@tknopp
Copy link
Contributor

tknopp commented Feb 24, 2014

I find it quite annoying when technical GUIs respect the locale settings but this is because I as a programmer have been running into issues several times.

But when writing a GUI in Juliaand when wanting 1,0 to be a floating point number, it should be no problem to use parse_user_locale.

But in general we should look into how others (Java, Python, ...) have solved this (or not solved this...)

@vtjnash
Copy link
Member Author

vtjnash commented Feb 24, 2014

I was looking up python just now. I don't think it is the greatest example. They mostly defer the work to a locale module (I think this is good), but calling setlocale in that module calls the c setlocale function and can affect other code in the python standard library (possibly not a great idea): http://docs.python.org/2/library/locale.html#background-details-hints-tips-and-caveats

Java seems like it has roughly the same behavior as python.

PHP, Javascript seem like they may have roughly the same behavior as C.

If non-US users have become comfortable with either format anyways (and don't find it inconsiderate), then I vote for making numerical parsing consistently locale-independent, unless the user specifically asks for it (probably by directly calling functions from a Locale.jl package)

@JeffBezanson
Copy link
Member

Agree. There are probably ISO standard formats anyway.
On Feb 24, 2014 12:32 AM, "Jameson Nash" notifications@github.com wrote:

I was looking up python just now. I don't think it is the greatest
example. They mostly defer the work to a locale module (I think this is
good), but calling setlocale in that module calls the c setlocale function
and can affect other code in the python standard library (possibly not a
great idea):
http://docs.python.org/2/library/locale.html#background-details-hints-tips-and-caveats

Java seems like it has roughly the same behavior as python.

PHP, Javascript seem like they may have roughly the same behavior as C.

If non-US users have become comfortable with either format anyways (and
don't find it inconsiderate), then I vote for making numerical parsing
consistently locale-independent, unless the user specifically asks for it
(probably by directly calling functions from a Locale.jl package)

Reply to this email directly or view it on GitHubhttps://github.com//issues/5928#issuecomment-35858837
.

@tknopp
Copy link
Contributor

tknopp commented Feb 24, 2014

sounds like a plan. And as Julia calls setlocal currently, this would not even be a breaking change. Just no surprises anymore when doing using Gtk.

Good that this is tagged with the 0,3 milestone by the way ;-)

@Keno
Copy link
Member

Keno commented Feb 24, 2014

I agree, I've always hated systems that switch parsing on me because of my locale settings.

@StefanKarpinski
Copy link
Member

I'm pretty sure that's a quorum of people from other countries who hate this feature, despite its good intentions. It does seem like pure insanity to me.

@toivoh
Copy link
Contributor

toivoh commented Feb 24, 2014

Well, I know that Swedish programmers quite generally hate the comma as a
decimal point, at least.

@tknopp
Copy link
Contributor

tknopp commented Feb 24, 2014

Germans too. I have been running into that trap in C several times when reading/writing ini files.

But although I hate it as a programmer, there are very good reasons to have it in GUIs and websites. So we should be open minded to the problem of locale formatting.

The big issue with setlocale is that you have a global switch that changes the file handling part (std::cout) but also the GUI part (in Qt and Gtk). IMHO it is much better to explicitly handle this directly under the surface of the GUI. White similar, internationalization is handled usually by explicit calls to some TR macro.

@JeffBezanson
Copy link
Member

@JeffBezanson
Copy link
Member

@tknopp
Copy link
Contributor

tknopp commented Feb 28, 2014

@JeffBezanson @vtjnash:
I can tackle this but would propose one of the following alternative fixes:

  1. Use strtod_l which seems to be available on all platforms. It allows to pass a locale object and internally strtod uses strtod_l anyway.
  2. In C++ one can change the local within an iostream. So as opposed to strtod_l this uses standard C++. However, the interface is a little different as strtod has a second argument that seems to be used in some Julia code.

I would put this into a file strtod.c/cpp that is part of libsupport and name the platform-independent wrapper function something like strtod_c.

@tknopp
Copy link
Contributor

tknopp commented Feb 28, 2014

Ok I think the strtod_l solution is smarter. I currently don't have a Julia environment at hand but it will look something like the following https://gist.github.com/tknopp/9271244

I will later prepare a PR. My environoment is OSX. The windows implementation will have to be tested by someone having a build environoment.

@nalimilan
Copy link
Member

Living in a country where , is the decimal separator, I can confirm I much prefer parsing to be locale-independent, and have to explicitly ask for using a different decimal separator than . (e.g. via arguments to readcsv()). Else some programs will break unexpectedly when using my locale.

The same applies to formatting decimal numbers: UI printing (e.g. the REPL) should adapt to the user locale but exporting functions should always use ., so that parsing back the data works disregarding the locale.

@JeffBezanson
Copy link
Member

Great, let's try this. Thanks @tknopp .

@vtjnash
Copy link
Member Author

vtjnash commented Mar 7, 2014

I tried to do this using strtod_l, but the existence and documentation for this function on linux seems to be poor to nonexistent.

@tknopp
Copy link
Contributor

tknopp commented Mar 7, 2014

@vtjnash: I have strtod_l working on OSX and linux in #5988. We only require a workaround for windows as the mingw c library has neither strtod_l nor the windows counterpart _strtod_l. In the end I have ported Pythons ascii_strtod function which replaces the delimiter and uses standard strtod. It works almost but one test is currently failing. Hopefully this is fixable though. Help is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants