Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous rounding in float arrays when displaying in REPL #6608

Closed
aviks opened this issue Apr 23, 2014 · 26 comments
Closed

Erroneous rounding in float arrays when displaying in REPL #6608

aviks opened this issue Apr 23, 2014 · 26 comments
Labels
bug Indicates an unexpected problem or unintended behavior priority This should be addressed urgently
Milestone

Comments

@aviks
Copy link
Member

aviks commented Apr 23, 2014

julia> a=Array(Any, 0)
0-element Array{Any,1}

julia> push!(a, 666666.6)
1-element Array{Any,1}:
 666667.0

julia> a[1]
666666.6

julia> show(a)
{666666.6}

julia> sprint((x,y)->writemime(x, (@MIME_str "text/plain")(),y) , a)
"1-element Array{Any,1}:\n 666667.0"

julia> versioninfo()
Julia Version 0.3.0-prerelease+2306
Commit 509da87* (2014-03-31 07:12 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin11.4.2)
  CPU: Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

from: JuliaIO/JSON.jl#61

@aviks aviks changed the title Erroneous rounding for floats arrays when displaying in REPL Erroneous rounding in float arrays when displaying in REPL Apr 23, 2014
@aviks
Copy link
Member Author

aviks commented Apr 23, 2014

Actually, looking at the code, this seems to be the intended, although it does lead to some user confusion. Closing, unless someone wants to debate the wisdom of this behaviour?

showcompact(io::IO, x::Float64) =
     (Base._limit_output::Bool) ? _show(io, x, PRECISION, 6, false) : _show(io, x, SHORTEST, 0, false)
julia> sprint(showcompact, a[1])
"666667.0"

@aviks aviks closed this as completed Apr 23, 2014
@jiahao
Copy link
Member

jiahao commented Apr 23, 2014

I've not looked at the display code before, but 666667.0 is 7 significant figures, not 6. I would have expected six digit precision to be 666667 or 666667. (with the decimal point and no trailing digits).

@WestleyArgentum
Copy link
Member

@jiahao: +1 to that

666667. seems much more reasonable, including the .0 at the end is incredibly misleading.

Can we open this issue back up?

@Keno Keno reopened this Apr 23, 2014
@mbauman
Copy link
Member

mbauman commented Apr 23, 2014

I agree that adding the .0 doesn't quite comply with the honest printing policy. The minimal fix for this is trivial...

diff --git a/base/grisu.jl b/base/grisu.jl
index 04cbdd1..488e047 100644
--- a/base/grisu.jl
+++ b/base/grisu.jl
@@ -100,7 +100,7 @@ function _show(io::IO, x::FloatingPoint, mode::Int32, n::Int, typed, nanstr, inf
             write(io, '0')
             len += 1
         end
-        write(io, ".0")
+        write(io, ".")
     else # => ####.####
         write(io, pdigits, pt)
         write(io, '.')

... but it has bigger consequences. Now all float representations (truncated or not) only have a trailing period, instead of .0.

julia> 1.0
1.

@StefanKarpinski
Copy link
Member

This is clearly a bit of a hack and should be improved.

@quinnj
Copy link
Member

quinnj commented Jun 24, 2014

Is the main reason for calling showcompact on floats for arrays to achieve uniform width in columns? I'm trying to see if any of my new grisu code can help with this.

@StefanKarpinski
Copy link
Member

Yeah, it's mainly so that it printing arrays of e.g. random floats doesn't end up showing only one or two columns. It might be good to do something adaptive instead – i.e. only show shortened numbers when there's actually a lack of screen space, but that's a more complicated issue. I suspect that your Grisu implementation may very well be useful.

@JeffBezanson
Copy link
Member

Want to make sure this one doesn't get lost. Printing numbers is pretty important.

@StefanKarpinski
Copy link
Member

This is fixed by Jacob's pure Julia implementation of grisu, but I think that's a pretty risky thing to merge right before a release.

@quinnj
Copy link
Member

quinnj commented Jul 30, 2014

I'm not so sure. I'm sure it can help having a bit more finer-grained control over grisu, but this seems more like a broader design issue of "showing" things.

In grisu, there's the notion of

  • "print shortest", which means print the shortest representation of this float that it could be parsed back to the same value. Variable width, though hard cap of 17 digits total.
  • "precision printing", which means, print this number with x total digits (what we're using now in showcompact). Fixed-width.
  • and "fixed printing", which means, print this number with x digits after the decimal point. Variable width before the decimal point, fixed-width after the decimal point.

Given the grisu architecture, we could try:

  • keep using "precision" printing and perhaps increase the precision, though we'd still run into the same 6666667.0 issue at some point.
  • using "print shortest" and do a hard cap of 10 or 15 or even 17. Using print shortest in some way would be ideal since it's the fastest and generally the sanest output (with the parse back in what gets put out rule).

There are also other ideas to consider, has there's been talk of wanting to revamp the whole show, print, display, machinery anyway, a more composable solution could be figured out that gets this right.

@mbauman
Copy link
Member

mbauman commented Jul 30, 2014

Yup, this is tough. Basically, what we want here is

  • Print the shortest representation possible
    • Except if:
      1. the shortest representation has all zeros right of the decimal point
      2. AND there's enough precision to be sure that the tenths place is indeed a zero
        • then in this case, we also want to print an extra trailing zero

In my cursory looking, I don't think that part b is exposed by the GRISU c library we're currently calling, but I may be mistaken.

@quinnj
Copy link
Member

quinnj commented Jul 30, 2014

I don't think I get your exception case @mbauman, can you give an example?

@mbauman
Copy link
Member

mbauman commented Jul 30, 2014

Sure. My thinking is that 1.0 isn't actually the shortest floating point representation of one; 1. is. But printing floating point one as 1. everywhere (as the patch I posted back in April would do) is visually ambiguous with the integer one. That means that we don't really want to always print the shortest representation.

I'm turning the problem on its head. Instead of 666667. being a special case, it's 1.0 (for example) that is special. So the exception to the print shortest rule is that we tack on a zero in the tenths place if there's sufficient precision to do so — simply so integer floating point numbers display as X.0 like we want them to.

@jiahao
Copy link
Member

jiahao commented Jul 30, 2014

But printing floating point one as 1. everywhere (as the patch I posted back in April would do) is visually ambiguous with the integer one.

I'm not sure why the integer one can't be just 1 without the period.

@mbauman
Copy link
Member

mbauman commented Jul 30, 2014

Visually ambiguous wasn't the correct choice of words. I meant that 1. is easily confused with 1, and it goes against the typical Julian style of writing floating point integers with a trailing zero in code.

@JeffBezanson
Copy link
Member

For everything except show/repr we could potentially omit the period. Usually it's nice to clarify that something is a float though.

@nalimilan
Copy link
Member

I also think we should use what @quinnj calls "fixed printing" above: using a fixed number of digits after the period for all numbers in the array, and add padding in front of the number to align its period with that of the one with the larger number of digits.

@pao
Copy link
Member

pao commented Oct 14, 2014

Counterargument: When array elements are at different orders of magnitude, I've found the "print everyone the same with a single scaling factor" approach that MATLAB uses to be extremely frustrating. An easy-to-remember "please just show me the whole matrix with enough precision" command would be helpful if we're going down this route.

@nalimilan
Copy link
Member

@pao R switches between standard and scientific notation depending on which one takes less width for the wanted precision, with a customizable factor favoring the former. It could be an interesting solution.

@pao
Copy link
Member

pao commented Oct 14, 2014

Does that work per-element, though?

For the general case with fixed digits, MATLAB does the same thing as R it sounds like--the format command family controls the global preference for display notation. Which all gets back to what @quinnj said above, which is that perhaps the whole thing needs a rethink with composability and configurability in mind.

@nalimilan
Copy link
Member

Does that work per-element, though?

In R, the same display format is used for all elements in a column of an array, presumably to ensure the output is aligned. Is that what you were asking?

It would make sense IMHO to use the same format for all elements in an array instead, since you may want to compare elements across different columns.

@jiahao
Copy link
Member

jiahao commented Oct 14, 2014

Here's a concrete example: how should [1 e; 0 eps()] print?

@nalimilan
Copy link
Member

I'd say like this:

julia> [1 e; 0 eps()]
2x2 Array{Float64,2}: (rounded to 5 decimal places)
 1.00000e+00  2.71828e+00    
 0.000000+00  2.22045e-16

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.4.1 Mar 8, 2015
@vtjnash
Copy link
Member

vtjnash commented Apr 16, 2015

what if we were to simply alter the printing code to refuse to ever get into this ambiguous situation?

$ git diff
diff --git a/base/grisu.jl b/base/grisu.jl
index e6f3fb0..694e7b1 100644
--- a/base/grisu.jl
+++ b/base/grisu.jl
@@ -71,7 +71,7 @@ function _show(io::IO, x::FloatingPoint, mode, n::Int, typed, nanstr, infstr)
         end
     end
     neg && write(io,'-')
-    if pt <= -4 || pt > 6 # .00001 to 100000.
+    if pt <= -4 || pt > 6 || pt >= n # .00001 to 100000.
         # => #.#######e###
         write(io, pdigits, 1)
         write(io, '.')

this changes the algorithm to prefer scientific notation instead of printing trailing zero. it is therefore more accurate in all cases where it applies, except when abs(((x + 0.05) % 10^(pt - len)) - 0.05)) < 0.05 when it is a tossup (because it would have been valid to print .0 in that case).

(edit: this proposed patch also fixed #9072)

@StefanKarpinski
Copy link
Member

@vtjnash I like it 👍

@vtjnash
Copy link
Member

vtjnash commented Apr 16, 2015

before:

julia> Float64[1 10 100 1000 10000 100000 1000000 10000000]
1x8 Array{Float64,2}:
 1.0  10.0  100.0  1000.0  10000.0  100000.0  1.0e6  1.0e7

after:

julia> Float64[1 10 100 1000 10000 100000 1000000 10000000]
1x8 Array{Float64,2}:
 1.0  10.0  100.0  1000.0  10000.0  1.0e5  1.0e6  1.0e7

any preference on whether I also implement the complex expression above to preserve the "before" behavior for the array above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior priority This should be addressed urgently
Projects
None yet
Development

No branches or pull requests