-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[1.x] Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator #17177
[1.x] Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator #17177
Conversation
@nickguletskii can resolve the conflicts so this PR may be merged? Are you interested in working on the "more principled approach to serialization in MXNet 2.0, e.g. using a binary format for communication between the frontend and the backend. In addition to solving locale-related issues, this would probably result in a smaller invocation overhead."? Part of this may (or may not) be done already via the FFI work lead by @hzfan? |
+1 need this fix on MXNet 1.7 |
@nickguletskii, as we're doing 1.7 release recently, can you also help to backport this PR to v1.x branch as @stu1130 mentioned, which will be included in MXNet 1.7 as well? Thanks! |
6386b92
to
e4f4a01
Compare
Sorry for accidentally adding so many reviewers: this was a side effect from rebasing onto the v1.x branch, which didn't exist when I first created this PR. @leezu Judging by the documentation for the new FFI, it does seem that using it would solve the problem. However, that would require all operators, optimizers and other parts of the codebase to be ported to the new FFI. Therefore, I don't think this patch is necessary on the master branch. |
Shall we add a check which throws an error in sanity if people use the STD method? |
@marcoabreu I think it would be better to add a (nightly) CI job that will run the whole test suite with a locale that is vastly different from |
@mxnet-bot run ci windows-cpu |
Undefined action detected. |
@mxnet-bot run ci [windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
Sure, but a text based scan is a lot quicker and less costly than running the fully suite. They are not exclusive, but enforcing the use of the dmlc version as part of the sanity check should bring a lot of value for low investment. |
We already run the testsuite on 2 Linux platforms: Ubuntu and CentOS7. I suggest to switch the locale for the CentOS7 platform instead of introducing another nighlty build. @nickguletskii would you like to help open a PR to the master branch? You'd need to change this Dockerfile https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.centos7_cpu https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.centos7_gpu |
Adding to 1.7.0 roadmap #16864 |
Good idea! I've made a pull request that updates the CentOS CI jobs and changes the tests to respect the locale environment variables: #18097 |
@mxnet-bot run ci [windows-gpu, unix-gpu] |
Jenkins CI successfully triggered : [windows-gpu, unix-gpu] |
@nickguletskii , can you help to take a look at/solve the failed CI? As the code freeze date is postponed to April 25 PST, please make sure this PR is included both in v1.x and v1.7.x branch as well before the freeze date. Thanks! |
@ciyongch seems like the PR is blocked by |
@stu1130 these shouldn't be blockers @mxnet-bot run ci [windows-gpu, unix-gpu] |
Jenkins CI successfully triggered : [windows-gpu, unix-gpu] |
Thanks @stu1130 and @leezu to help tracking this PR. |
@ciyongch From what I've seen, the CI failures are from CI timing out. I don't think this PR is causing the CI failures, since the CI seems to hang up on different tests each time... |
@mxnet-bot run ci [all] |
Jenkins CI successfully triggered : [edge, sanity, miscellaneous, centos-gpu, website, windows-cpu, windows-gpu, unix-cpu, clang, unix-gpu, centos-cpu] |
@nickguletskii thanks for your information, then re-trigger the failed CI would be helpful for merging. Can you help to backport this PR to v1.7.x branch as well? |
29591f0
to
0b13736
Compare
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-gpu] @nickguletskii the deadlock is not deterministic, so until the root cause is fixed / the respective CI tests are disabled (#18151) to at least have a stable CI, retriggering will still help. |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-gpu] |
The back ported PR to v1.7.x branch #18147 passed CI :) |
…when the C locale is set to a locale that uses commas as the decimal separator) (#18147) * Add a test for floating point parsing locale invariance * Use locale-invariant dmlc:stod/stof instead of std:stod/stof * Change the new operator tutorial to use dmlc:stod instead of std::stod * Rename locale invariance test * Skip test_scalarop_locale_invariance if the locales aren't available * Fix linter errors due to incorrect include order
Merge based on the CI results on v1.7.x. Thanks @nickguletskii |
Description
Currently, many operators utilize
std::stod
to convert strings into floating point numbers. This causes incorrect calculations (#17140, #16134) when the C locale is set to a locale which uses commas (,
) as decimal separators.This pull request replaces calls to
std::stod
andstd::stof
to the locale-invariantdmlc::stod
anddmlc::stof
respectively.The scope of this patch
This patch should fix a large portion of interactions through Python and JVM frontends, since they use locale-invariant serialization in order to pass parameters into MXNet's C API.
However, frontends which utilize C locale-aware serialization (i.e. call
sprintf
or similar) may break when using locales which don't use.
as the decimal separator. They will have to be fixed in a separate patch. I also suspect that they are already broken, because operators utilising dmlc-core parameter parsing were already using locale-invariant serialization.Further steps
STL streams are heavily used within the codebase, both for serialization and for forming user-friendly messages. Fortunately, they don't seem to be affected by the C standard library locale settings. However, if someone sets the STL locale by calling std::locale::global, MXNet's API will be broken. In order to ensure that this doesn't happen, all streams which are used for serialization will have to be imbued with the "C" locale (not the locale set in the C standard library).
It would be nice to see a more principled approach to serialization in MXNet 2.0, e.g. using a binary format for communication between the frontend and the backend. In addition to solving locale-related issues, this would probably result in a smaller invocation overhead.
Locale-invariant serialization vs locale-aware serialization
As a side-note, using locale-aware serialization is not an option, simply because using
,
as the decimal separator adds ambiguities to tuple serialization, e.g.(4,4,3)
can be a tuple of 3 integers, or a tuple of 2 floats.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
std::stod
withdmlc::stod
andstd::stof with
dmlc::stof`.Comments
,
as the decimal separator. However, since there was no consistency between the various operators before, it is not unlikely that the code was already broken.