Skip to content

Commit

Permalink
fix 0^0 definitions
Browse files Browse the repository at this point in the history
  • Loading branch information
simonbyrne committed Apr 17, 2013
1 parent 0c18054 commit 3ef1d4d
Showing 1 changed file with 16 additions and 2 deletions.
18 changes: 16 additions & 2 deletions test/complex.jl
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,23 @@
# equivalent to exp(y*log(x))
# except for 0^0?
# conj(x)^conj(y) = conj(x^y)
@test isequal(complex(0.0,0.0)^complex(0.0,0.0), complex(1.0,0.0))
@test isequal(complex(0.0,0.0)^complex(0.0,0.0), complex(1.0,-0.0))

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

Why would complex 0.0^0.0 equal 1.0 - 0.0im ?

This comment has been minimized.

Copy link
@erg

erg Apr 17, 2013

It's supposed to be indeterminate, no? The pow function in libc has this in the docs:

If y is 0, the result is 1.0 (even if x is a NaN)

http://www.wolframalpha.com/input/?i=0+%5E+0

So if you're using pow you might have to special case things for correctness.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

That doesn't explain why isequal(complex(0.0,0.0)^complex(0.0,0.0), complex(1.0,-0.0)) is a valid test case. Testing for that asserts that 1.0-0.0im is the correct answer.

This comment has been minimized.

Copy link
@erg

erg Apr 17, 2013

My point is that it's not the right answer according to wolfram alpha, the pow docs, and this random math link http://mathforum.org/library/drmath/view/53660.html.

0 ^ 0 is indeterminate, complex or otherwise, positive or negative zero real/imaginary component. At least that's my understanding of it...

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

@erg the fact that it is indeterminate reflects that the limit does not exist since you get different numbers if you take the limit in different ways. However it is not wrong to consider 1 - 0im as one of the possible answers because it is consistent with all the choices made in the definitions in the C99 standard. Mathematica has chosen to define 0^0 as its own special case which is allowed, but we simply haven't made that choice (yet).

This comment has been minimized.

Copy link
@erg

erg Apr 17, 2013

I'm trying to figure this out for Factor, too, which is why I'm interested in what Julia is doing. Thanks for the explanation.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

Ah, cool to have a Factor developer here, welcome!
I agree 1-0im is a possible answer, since it still equals 1 (assuming for now we'll go with the libm standard). But it seems strange to test for it, and it doesn't agree with the answer I get in the current julia. So I interpreted the test as a suggestion to change the answer to 1-0im, and I don't know why we'd do that.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 17, 2013

Author Contributor

Ah, sorry, I didn't realise this wasn't on the issue thread: the reason is that it keeps the limits consistent

julia> complex(1e-10,1e-10)^complex(1e-10,1e-10)
0.9999999976535324 - 2.189387912488976e-9im

^ isn't really specified that clearly: the mac documentations for cpow says that cpow(x,y) == cexp(y * clog(x)). However for x=0 + i0, clog(x) == -Inf + 0i, which means that the term inside the y*clog(x) would have an NaN.

The C99 standard isn't really specific either, just saying (in a footnote):

This allows cpow(z, c) to be implemented as cexp(c clog(z)) without precluding
implementations that treat special cases more carefully.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 17, 2013

Author Contributor

As for the argument that it should be indeterminate, Kahan (section 10) makes a fairly convincing argument that 0^0 should be 1 (though he doesn't mention what sign the zeros should be, which is odd as that was the point of the whole article).

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

Ok, interesting. It seems maybe atan2 is not doing what we want:

julia> atan2(-0.0,0.0)
-0.0

julia> atan2(0.0,-0.0)
3.141592653589793

It looks like those answers should be switched.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 17, 2013

Author Contributor

Why should they be switched? That's standard behaviour (though arguably poor mathematically)

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

(cracks knuckles) I worked through the two cases brought out by Kahan more carefully in this PDF document and arrived at the answer

(0.0+0.0im)^(0.0+0.0im) = 1.0-0.0im

I've worked out a formula that applies to all 16 cases of complex 0^0 in the note. Take sign of the argument of the second complex zero, multiply it by the sign of the real part of the first complex zero, then invert the sign. This is the sign of the imaginary part of the answer.

As usual, someone should check that I didn't goof. It's been awhile since I took complex analysis.

This comment has been minimized.

Copy link
@erg

erg Apr 17, 2013

The document 'Julia Working Note: 0^0' has been deleted

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

If our branch cut is the negative real line, then atan2(-eps,eps) should approach pi, and atan2(eps,-eps) should approach -0.0. I'm hardly an expert here but that seems right to me. If we all agree it's right then there's no reason to stick with tradition.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

Sorry, fixed the URL in an edit but I guess it didn't propagate. http://www.scribd.com/doc/136561225/Julia-Working-Note-Zero-to-Zero

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

@JeffBezanson the behavior of atan2 is correct according to IEEE 754-2008. The actual standard itself is here but you can see the edge cases here. Such is the defined standard for real floating-point arithmetic. If there is any discrepancy, it should be whether atan2(imag(z),real(z)) returns the correct argument of a complex number.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

In that case we should fix angle, and use it instead of atan2 when needed in complex functions.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

sounds good to me.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 17, 2013

Member

Damn, I forgot the argument order is atan2(y,x). I've been thinking it was atan2(x,y) which explains why I was confused.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

This also explains why I was confused about your being confused. 😁

This comment has been minimized.

Copy link
@erg

erg Apr 17, 2013

Is there any case where you'd want the answer to be NaN instead? Maybe a calculation is wrong, leads to 0^0, and you get a 1 instead of NaN so it masks a bug?

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

@erg of course. The Kahan paper even gives two cases:

Perhaps fear is induced by the singularity that z^w possesses at z =w= 0;
if both z and ware compelled to approach 0 but allowed to do
so independently along any paths, then paths may be chosen on
which z^w holds fast to any preassigned value whatsoever.

We discussed earlier how 0^0 when interpreted as the limit of something can be indeterminate because it depends on exactly how you take the limit. Kahan arbitrarily picks one such limiting process, which in essence defines the ^ operation.

Assuming for the sake of argument (because it is generally not so)
that neither z nor W could be exactly zero but must instead be
approximately zero because of roundoff or underflow, the expression
0° would have to be treated as if it really ought to have
been (roundoff)^roundoff, which generally defies estimation.

In this case Kahan says that you can reach 0^0 via some roundoff process which can be thought of as an approximation of machine epsilon raised to itself. However, this is not a well-defined process in floating-point, because

eps^eps = exp(eps * log(eps)) 

which is difficult to evaluate if eps is machine epsilon. If you are very careful you could reason that

log(eps) = -Inf
eps * -Inf = -eps
exp(-eps) = 1.0

but as you can see, step 2 is very dubious.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 17, 2013

Member

The answer I arrived at is equivalent to the rule

-1 * sign of real part of first 0 * sign of imaginary part of second 0 == sign of imaginary part of answer

This agrees with half of these tests. lines 156-159, 164-167 have the opposite sign to that expected from this rule.

This comment has been minimized.

Copy link
@wlbksy

wlbksy Apr 18, 2013

Contributor

Hello, I know this is irrelevant and it's against IEEE 754, but is there any personal way to make 0 * NaN equals 0 ? Thanks in advance.

This comment has been minimized.

Copy link
@StefanKarpinski

StefanKarpinski Apr 18, 2013

Member

You can redefine floating-point multiply in float.jl, but that seems like a pretty sketchy change to me. It will also likely be rather slow.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 18, 2013

Member

It's best to handle that as a special case where it is needed in your code.
Or, within a module, you can define

*(x::Float64,y::Float64) = (x==0||y==0) ? 0 : Base.(:*)(x,y)

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

@jiahao I think Kahan's second argument that (based on Taylor series of analytic functions) is more convincing. Moreover, it matches up with the definition for floats and integers, as well as mathematical convention (e.g. see here).

As for the signs, they could well be wrong, it was just what I figured out via my empirical epsilon approach above.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

Of course, you can get different signs with different limiting sequences, e.g.:

julia> complex(1e-10,1e-10)^complex(1e-10,1e-100)
0.9999999977320723 + 7.85398161616222e-11im

However, I think all those signs are correct if you take the limit w -> 0 of (w+iw)^(w+iw). I'm not sure if that is a sufficient argument.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

Actually, I'm beginning to think that may be a terrible argument, as it doesn't fit in with the behaviour of angle (and atan2), which have a "primacy of reals" property in that

angle(+0+0i) = lim_{x-> +0} angle(x + 0i) = +0
angle(+0-0i) = lim_{x-> +0} angle(x - 0i) = -0
angle(-0+0i) = lim_{x-> -0} angle(x + 0i) = +pi
angle(-0-0i) = lim_{x-> -0} angle(x - 0i) = -pi

so perhaps we would be better with

(+0+0i)^(+0+0i) = lim_{x -> +0} lim_{u -> +0} (x+i0)^(u+i0) = +1+0i

I'm reasonably confident that this quantity is well-defined, though if anyone knows a good complex analyst, I would recommend asking them.

EDIT: fixed last equation

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

Okay, having thought about this a bit more, I think this can be made precise.

For any complex function of one argument, define

f(+0+0i) = lim_{x -> +0} lim_{y -> +0} f(x+iy)
f(+0-0i) = lim_{x -> +0} lim_{y -> -0} f(x+iy)
f(-0+0i) = lim_{x -> -0} lim_{y -> +0} f(x+iy)
f(-0-0i) = lim_{x -> -0} lim_{y -> -0} f(x+iy)

In other words: the limit of the imaginary part is taken first, and the direction of the limit is determined by the sign of the zero.

For a binary function, the problem is the order of the limits of the 4 arguments. For (x+iy)^(u+iv), I propose the order x,y,u,v (i.e. v first, then u, etc.).

(x+iy)^(u+iv) = r (cos(theta) + i sin(theta))

where

r = exp(u*log(abs(x+iy)) - v*angle(x+iy))
theta = v*log(abs(x+iy)) + u*angle(x+iy)

First take the limit v -> +-0:

r = exp(u*log(abs(x+iy)))
theta = u*angle(x+iy)

Now if we take u -> 0, we have

r = exp(0) = 1
theta = sign(u) * sign(angle(x+iy)) = sign(u) * sign(y)

This remains the same by further taking signed limits of x and y. Hence, if u,v are signed zeros and x,y are any finite values, we have

(x+iy)^(u+iv) = 1 + i0*sign(u)*sign(y)

The results are the same if we also use the order x,u,y,v.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 18, 2013

Member

@simonbyrne I agree with what you've written, but as you say, the result depends on the way you take the limit, which is important because z^w is not analytic at (0,0). The first way you choose is equivalent to lim_(z->0, z/w=1), which is not wrong but it is only one of many possible choices. The second way you choose makes different limit choices. These happen to differ from Kahan's. My reading of Kahan is that he picks the limits (x,y) first, then (u,v) in your notation, which is how I got to my answer. You can see that in his definition of the complex power, he deliberately defines it as the limiting value of the base, so it is clear that he means z->0 first, then w->0.

Would be great if you could check my interpretation and calculation. I took the limits to mean |z| -> 0, then |w| -> 0, with arbitrary phases.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

@jiahao Kahan (section 10, 2nd equation) defines it as

z^w = lim_{zeta -> z} exp(w * log(zeta))

i.e. not taking the limit of w, just treating it as fixed. For a fixed non-zero zeta, exp(w * log(zeta)) is continuous as a function of w, hence the limits are well-defined and equal to the function value. In other words, this is the same as taking the limit of w = u+iv first.

If you took z as the first limit, you would get 0^w, which is zero for non-zero w (and taking the limit of this disagrees with mathematical convention).

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 18, 2013

Member

@simonbyrne I don't see how to interpret 0^0 as anything other than the limit of something in order for it to have any meaning. But I think I managed to confuse myself. If one has multiple limits, are they evaluated from outer to inner (left to right) or the other way? I agree that taking z->0 first doesn't make any sense. I guess Kahan must mean that you do the inner one last, so it would be w->0 first, then z->0. I think my calculation still works though (I may have already taken the limits in that order so that ln r doesn't blow up), so I'm still failing to see why I get a different answer from yours.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

@jiahao One justification for 0^0=1 that doesn't rely on limits is as the empty product, or as the number of empty tuples of elements of the empty set (see wikipedia). Of course, this doesn't help with the problem at hand.

Limits are taken inside first: as I said above, Kahan doesn't take a limit on w (as zeta^w is continuous). So the only reason to take limits of w is to determine the sign of 0.

I'm not sure I follow your note completely, but in step 2, ln(r e^{i theta}) should expand to ln(r) + i theta (the following line seems to imply an expansion to ln(r) i theta). I think that may possibly explain the different answers.

This comment has been minimized.

Copy link
@StefanKarpinski

StefanKarpinski Apr 18, 2013

Member

That justification seems like a bit of a pun in this case since it really only makes sense where 0 is an integer and in the set theory tradition, identified with the empty set. In this case, since we're talking about complex numbers and functions on them, arguments based on limits are far more appropriate.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 18, 2013

Author Contributor

True, but "the principle of least surprise" would dictate that the function should return "equivalent" values for "equivalent" arguments, for integer, real and complex types.

This comment has been minimized.

Copy link
@StefanKarpinski

StefanKarpinski Apr 18, 2013

Member

Right, that's a good point.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 18, 2013

Member

@simonbyrne thanks for catching that. I'll redo my calculation.

My point about the limits still stand. You can see in the IEEE standard and on Wikipedia that +/-0.0 are defined by the one-sided limits x->0+/-, just like +/-Inf. (+0.0 is a bit ambiguous as it represents both the limit and the literal zero.) I agree with @StefanKarpinski that while it is nice that other definitions of 0^0 can give 1, there are many others that do not (related to the earlier discussion on indeterminacy) so principle of least surprise is insufficient justification in this case. Besides the other definitions don't help determine the sign of 0, which was the original problem.

Sorry if i'm being pedantic, but it helps to be very careful with edge cases like this one.

This comment has been minimized.

Copy link
@jiahao

jiahao Apr 19, 2013

Member

At this point it is still not entirely clear to me how the limits should be taken. Therefore, I've checked all the 4!=24 possible permutations of limits. Half of them give indeterminate limits or 0 as the limit and we can ignore those. The remaining 12 divide equally into two equivalent cases:

  1. the limit of v is taken first, then the limit of u. The answer is sign(u)*sign(y) as given above by @simonbyrne.
  2. the limit of u is taken first, then the limit of v, then the answer is -sign(v); the other term contributing to the argument dominates.

It turns out that this is the only important ordering; the order of the limits of x and y turn out not to matter at all.

Of the two cases, I would say that case 1 is the correct one since case 2 describes a limiting process along the negative real axis, which is also where the branch cut is. Based on that, then the only reasonable limit is v,u,y,x, which is the case advocated by @simonbyrne above.

At either rate, Julia's current behavior agrees with neither case, nor do the existing tests agree with either case. In case 1, the correct tests are

@test isequal(complex(+0.0,+0.0)^complex(+0.0,+0.0), complex(1.0,+0.0))
@test isequal(complex(+0.0,+0.0)^complex(+0.0,-0.0), complex(1.0,-0.0))
@test isequal(complex(+0.0,+0.0)^complex(-0.0,+0.0), complex(1.0,+0.0))
@test isequal(complex(+0.0,+0.0)^complex(-0.0,-0.0), complex(1.0,-0.0))
@test isequal(complex(+0.0,-0.0)^complex(+0.0,+0.0), complex(1.0,+0.0))
@test isequal(complex(+0.0,-0.0)^complex(+0.0,-0.0), complex(1.0,-0.0))
@test isequal(complex(+0.0,-0.0)^complex(-0.0,+0.0), complex(1.0,+0.0))
@test isequal(complex(+0.0,-0.0)^complex(-0.0,-0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,+0.0)^complex(+0.0,+0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,+0.0)^complex(+0.0,-0.0), complex(1.0,+0.0))
@test isequal(complex(-0.0,+0.0)^complex(-0.0,+0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,+0.0)^complex(-0.0,-0.0), complex(1.0,+0.0))
@test isequal(complex(-0.0,-0.0)^complex(+0.0,+0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,-0.0)^complex(+0.0,-0.0), complex(1.0,+0.0))
@test isequal(complex(-0.0,-0.0)^complex(-0.0,+0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,-0.0)^complex(-0.0,-0.0), complex(1.0,+0.0))

Working notes: Mathematica notebook and its PDF.

This comment has been minimized.

Copy link
@simonbyrne

simonbyrne Apr 22, 2013

Author Contributor

@jiahao Fantastic, thanks for going through all that. I agree that option 1 seems like the best behaviour. However, I think you might have the tests the wrong way round, i.e.

@test isequal(complex(+0.0,+0.0)^complex(+0.0,-0.0), complex(1.0,+0.0))

and

@test isequal(complex(+0.0,-0.0)^complex(+0.0,+0.0), complex(1.0,-0.0))
@test isequal(complex(0.0,-0.0)^complex(0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(0.0,0.0)^complex(0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(0.0,-0.0)^complex(0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(-0.0,0.0)^complex(0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,-0.0)^complex(0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,0.0)^complex(0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(-0.0,-0.0)^complex(0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(0.0,0.0)^complex(-0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(0.0,-0.0)^complex(-0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(0.0,0.0)^complex(-0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(0.0,-0.0)^complex(-0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(-0.0,0.0)^complex(-0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,-0.0)^complex(-0.0,0.0), complex(1.0,-0.0))
@test isequal(complex(-0.0,0.0)^complex(-0.0,-0.0), complex(1.0,0.0))
@test isequal(complex(-0.0,-0.0)^complex(-0.0,-0.0), complex(1.0,0.0))

@test isequal(complex(0.0,-0.0)^complex(0.0,-0.0), complex(1.0,-0.0))



Expand Down

1 comment on commit 3ef1d4d

@ViralBShah
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.