-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle 1.#INF, 1.#IND, 1.#QNAN and 1.#SNAN #1788
Conversation
Interesting PR. If it makes a slow down, even of 2%, I would make it optional. Could be even an option |
Thanks for the feedback, I've opened an issue |
Current coverage is 90.26% (diff: 88.63%)@@ master #1788 diff @@
==========================================
Files 58 59 +1
Lines 10714 10750 +36
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 9704 9704
- Misses 1010 1046 +36
Partials 0 0
|
I've modified the code somewhat so it's now optional. To avoid if-tests at a low-level in the code, which would slow things down, I've moved some code to a templace C file, which is then included into In the R implementation of fread, the user can now specify a flag to indicate which |
@j0r1 be sure to squash your commits; also check the Travis log since something is awry. Thanks for the PR! |
@MichaelChirico Can I still squash commits inside this PR? Or should I just close this, squash them and open a new PR? The travis log is ok now, but two other checks are failing: I'm not sure why but I'm guessing because too much changes were detected? To make this feature selectable by a flag in the |
Yes, if you squash and I've never seen codecov fail before... not sure what to tell you there |
0b25e92
to
681de85
Compare
Thanks for the tip! Now it should be squashed into a single commit. Is the codecov failure a fundamental problem? |
When using the Visual Studio compiler, text representations 1.#INF, 1.#IND, 1.#QNAN and 1.#SNAN are used instead of 'inf' and 'NaN'. This patch recognizes them if a parameter for fread (R version) is set to TRUE. In the C code, the functions Strtod and readfile were moved to fread_readfile_template.h. Some functions were renamed, e.g. Strtod was renamed to TEMPLATE_Strtod. By setting defines in fread.c and including fread_readfile_template.h, two versions of the readfile function are created: - one uses the default code - the other uses strtod_wrapper and strtold_wrapper, which add extra checks to handle e.g. 1.#INF and 1.#IND Depending on the flag vs.inf.nan in the R fread function, one of the two C functions is called. This way, the original behaviour is still the default, and runs without any performance penalty. If needed, the user can activate the slightly slower modified code which performs extra checks.
681de85
to
bde8d95
Compare
Thanks for the PR and really sorry for not keeping up at the time, a year ago now.
|
When a CSV file was created using a program compiled with the Visual Studio compiler, instead of
inf
andnan
strings like1.#INF
,1.#IND
,1.#QNAN
and1.#SNAN
will be written. This patch is intended to be able to handle these strings as well, otherwise entire columns will be interpreted as text instead of numbers.Since extra checks are done for each double, the modified code is slightly slower. Using a very large CSV file of roughly 2GB, containing only floating point numbers, the new code read it in 35.1 seconds, whereas the unmodified version got 34.3 seconds. This indicates a slowdown of 2.2%