Floating point precision in DataFrame.to_csv #2069

wesm · 2012-10-14T19:23:06Z

http://stackoverflow.com/questions/12877189/float64-with-pandas-to-csv

What does R (or others) do?

pmorissette · 2012-10-17T19:28:07Z

Hey all,

I just started using Pandas a few days ago and ran into a related issue.

Basically I am reading in data from a .csv file. I have been writing some unit tests and was getting some errors because my expected values were different from the ones I calculated in Excel. At first, I assumed it was due to rounding but when I inspected my data frame, I realized that I was getting errors because of floating point issues. Basically, an input price of 7.34 was now 7.3399999999999999 (I am working with stock prices).

I was just wondering what the recommended way of dealing with this is, if any? Should I be converting my data frame to another type once imported?

Thanks in advance for your help and great job on this solid library.

wesm · 2012-11-03T17:24:14Z

It seems that CPython does a better job of float formatting than NumPy. I'll see what I can do

wesm · 2012-11-03T17:53:43Z

I can't manage to find a standalone reproduction of this. The csv module uses str (via PyObject_Str) to format the numbers, and that appears to work fine on numbers like 0.085 or 7.34. If someone can post an example illustrating this breaking down, I'll see what I can do

adamobeng · 2012-11-28T00:16:34Z

I think I've been able to reproduce this:

    df = pa.DataFrame({'float' : [9.728141, 4.810295]})
    df.to_csv('floats.csv')

floats.csv looks like:

,float                                                                                                                                                                 
0,9.7281410000000008
1,4.810295

wesm · 2012-11-28T00:28:20Z

What OS/Python/NumPy combination are you using?

adamobeng · 2012-11-28T00:35:48Z

uname -a

 Darwin boron 12.2.0 Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu-2050.18.24~1/RELEASE_X86_64 x86_64

sys.version

'2.7.3 (default, Nov  3 2012, 17:31:26) \n[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.57))]'

np.version

1.6.2

Edit: This does not happen (i.e. the output is as expected) on an EC2 node running starcluster with:

uname -a

Linux master 3.0.0-14-virtual #23-Ubuntu SMP Mon Nov 21 21:09:11 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

sys.version

'2.7.2+ (default, Oct  4 2011, 20:06:09) \n[GCC 4.6.1]'

np.version

'1.6.2'

wesm · 2012-11-28T01:20:47Z

Urgh I've dug down into the belly of the Python interpreter and believe that the formatting is eventually happening in the C stdlib, which means that Linux and OS X (BSD) have slightly different implementations. This is annoying is crap.

adamobeng · 2012-11-28T03:29:00Z

If I understand correctly, the problem comes from trying to write the underlying ndarray directly.

Is there a philosophical reason why there could not be a DataFrameFormatter for the CSV format, given that FloatArrayFormatter already takes care of this problem when outputting to LaTeX, HTML and plain text?

wesm · 2012-11-28T03:37:14Z

I guess the concern would be loss of precision

adamobeng · 2012-11-28T03:42:49Z

It depends whether you're using the CSV file for display or storage (i.e. as a faithful reproduction of the DataFrame). You might argue that using CSVs for storage is a bad idea anyway, because if the DataFrame contains arbitrary objects, you'll only end up with their string representations. Especially when you can serialize the same data very easily.

antonywu · 2013-04-06T06:29:50Z

So the current workaround is to use Linux, instead of Mac to get the results we wanted in csv file?
Honestly, for display purpose, I would prefer the option to intentionally drop trailing digits (yes, I mean rounding)... I wonder if there is a way to make it happen with .to_csv()..or would I have to write my own .to_csv() with dataframe iteration + round()

frgomes · 2013-05-08T14:26:18Z

I detected that read_csv has this bug too.

It's not a Python format issue. It's not a general floating point issue, despite it's true that floating point arithmetic is a subject which demands some care from the programmer. This article below clarifies a bit this subject:

http://docs.python.org/2/tutorial/floatingpoint.html

The problem is that it's necessary to employ fixed point arithmetic and only convert to floating point in the end, applying a convenient divisor.

A classic one-liner which shows the "problem" is ...

0.1 + 0.1 + 0.1
0.30000000000000004

... which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmetic and only in the last step you employ floating point arithmetic, it will work as you expect. See this:

(1 + 1 + 1) * 1.0 / 10
0.3

So, it's necessary to account to the position of the decimal point, ignore it initially and go ahead with the algorithm which converts text to integers (not floats!). The last step consists on converting an integer to a float by dividing by an adequate power of 10.

If you desperately need to circumvent this problem quickly, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer values back. Then convert those values to floating point, dividing by the same factor you multiplied before.

jreback · 2013-09-21T12:28:14Z

closing in favor of #4668

jiahe224 · 2023-02-21T03:51:37Z

@pmorissette Hi, Have you found a solution? I found this problem whenever read decimals to dataframe and save as other file, I don't want to use solutions like round or format

brendam mentioned this issue Jan 15, 2013

DataFrame.from_csv loses precision #2697

Closed

nehalecky mentioned this issue Jan 27, 2013

Series near-zero subtraction loss of precision #2760

Closed

frgomes mentioned this issue May 8, 2013

Floating point precision in DataFrame.read_csv #3545

Closed

cpcloud mentioned this issue Aug 26, 2013

custom formatters for to_csv #4668

Closed

5 tasks

jreback closed this as completed Sep 21, 2013

olafveerman mentioned this issue Sep 12, 2014

Floating point precision developmentseed/climatescope-data#8

Closed

tseth92 mentioned this issue Aug 7, 2019

to_csv has changed the number #27771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point precision in DataFrame.to_csv #2069

Floating point precision in DataFrame.to_csv #2069

wesm commented Oct 14, 2012

pmorissette commented Oct 17, 2012

wesm commented Nov 3, 2012

wesm commented Nov 3, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

antonywu commented Apr 6, 2013

frgomes commented May 8, 2013

jreback commented Sep 21, 2013

jiahe224 commented Feb 21, 2023

Floating point precision in DataFrame.to_csv #2069

Floating point precision in DataFrame.to_csv #2069

Comments

wesm commented Oct 14, 2012

pmorissette commented Oct 17, 2012

wesm commented Nov 3, 2012

wesm commented Nov 3, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

wesm commented Nov 28, 2012

adamobeng commented Nov 28, 2012

antonywu commented Apr 6, 2013

frgomes commented May 8, 2013

jreback commented Sep 21, 2013

jiahe224 commented Feb 21, 2023