var and std do not work for Any[] #8319

jakebolewski · 2014-09-12T02:18:54Z

All other statistic functions in Base work work for Any arrays of numeric values.

julia> test = Any[1,2,3]
3-element Array{Any,1}:
 1
 2
 3

julia> std(test)
ERROR: `zero` has no method matching zero(::Type{Any})
 in var at statistics.jl:162

julia> var(test)
ERROR: `zero` has no method matching zero(::Type{Any})
 in var at statistics.jl:162

The text was updated successfully, but these errors were encountered:

eschnett · 2014-09-12T02:59:39Z

Since var (and std) do not work for empty iterables anyway, the loop in "var" can be unrolled by one to avoid the need for calling zero.

timholy · 2014-09-21T16:53:31Z

Just to state the obvious, it's not limited to just Any: it's arrays of (non-numeric) immutables, arrays-of-arrays, etc.

andreasnoack · 2014-09-21T17:56:20Z

I like generic code, but how do you define the variance of an array of arrays?

timholy · 2014-09-21T18:24:48Z

I guess it only makes sense for Vectors, not general arrays, where the obvious definition is the covariance.

It's also pretty clear what you want in the case of ColorValues, which is what started my interest in this issue.

andreasnoack · 2014-09-21T18:30:56Z

Yes, variance of a vector of vectors makes sense. What is ColorValue a subtype of?

timholy · 2014-09-21T18:34:42Z

It's an abstract type defined in the Color package. This came up in JuliaImages/Images.jl#187.

jakebolewski · 2014-09-22T20:04:37Z

Does it make sense to define this for all possible numeric types? What does the variance of a complex number even mean? I guess this is the tension with allowing the definition to be completely generic.

StefanKarpinski · 2014-09-22T20:08:09Z

According to Wikipedia, the variance of a complex random variable does make sense and is defined as E[(X - µ)(X - µ)'] where ' denotes conjugate transpose. So we could make this correct but I don't think it is right now since we're not taking the conjugate.

johnmyleswhite · 2014-09-22T20:10:40Z

I think variance is well-defined whenever there's an L2 norm and a definition of expectation.

eschnett · 2014-09-22T20:21:29Z

As I mention above, the code can be easily rewritten to not call zero by re-writing the loop. This is a mechanical change that does not depend on any properties of the types involved.

The current code expects there to be a neutral element (zero) for the reduction operation, but only works if there is at least one element. In this case, one does not need the neutral element.

timholy · 2014-09-22T20:45:38Z

It's not just the zero; in the example from Images, there is a zero(RGB{Ufixed8}), but because an RGB is not a Number, the argument-typing means there is no suitable definition of varm.

StefanKarpinski · 2014-09-22T20:54:20Z

What's the inner product on colors?

timholy · 2014-09-22T21:28:59Z

In that case, the user "clearly" wants elementwise. It's basically the diagonal of the covariance=outer product (I would actually say it needs the outer product, not the inner product).

jiahao · 2014-09-22T22:06:13Z

From a strict, purely mathematical perspective, it is not possible to define a unique generalization of variance to objects with more than one component because there is no unique way to generalize the notion of "squaring" the random variable. So I would lean heavily in favor of defining var, std and the like only for real variables and let users define other methods suitable for their applications.

(Long rambling discussion warning)

For complex numbers the Wikipedia definition, taken literally, is either wrong or incomplete. You could define the variance of z as literally the expectation value E[ (z - E(z)) (z - E(z))* ], and the transposition on a scalar is trivial; this would generate a scalar not a matrix. However if we associate with z its Cartesian representation as a 2-vector (x, y), then it is still unclear what this definition means. (x, y) is real valued and the complex conjugation is a no-op, and you would get [ E(xx) E(xy) ; E(xy) E(yy)] as the variance (but is really the covariance). Or maybe it's actually the diagonal of the covariance, which is yet a different matrix still. (So it's still unclear what you would write down a definition that has a nontrivial "conjugate transpose".) Furthermore you can also define yet more notions of variance that do not violate the basic properties of expectations, like E[(z - E(z))^2] or E[|z - E(z)|^2, which lead to yet different results.

Similarly for vector-valued quantities it is a mistake to assume that the covariance is the only generalization of the variance; you can also have the "inner product" version E(X' X) as opposed to the "outer product" E(X X') which is the covariance. Of course it isn't clear how useful the "inner product variance" is, but it is still a perfectly well-defined cumulant.

jiahao · 2014-09-22T22:14:18Z

With regard to colors specifically, I'm concerned that applications taking means and variances of color objects may not be taking care of the underlying curvature of the space. (Which is yet another generalization to variance to something like E[ x_μ x^μ ] = E[ x^μ g_{μν} x^ν].)

Presumably it is desirable for the results of arithmetic on colors to be independent of the working color space. If so, the issue that arises is that the only flat color space is XYZ (i.e. XYZ is a linear vector space), so addition and multiplication in XYZ does not require additional curvature corrections, but they would be required for all other color spaces, which are not linear vector spaces, but manifolds with nontrivial curvature. Otherwise, you would get that taking the mean/variance does not commute with the convert operation going between different color spaces. Presumably ignoring the curvature of the RGB color space is why one gets phenomena like orange being the average color of every picture on the Internet.

timholy · 2014-09-22T22:24:26Z

Good points, @jiahao. I hadn't really even thought of defining all these operations only for XYZ colorspace, but technically you're right. OTOH, I really worry that people will be annoyed if taking the mean over an RGB image is slow because it requires two matrix multiplications per pixel.

Not really sure what to do here.

jiahao · 2014-09-22T22:36:10Z

Perhaps you can define some tolerance in terms of color differences within which you wouldn't bother to do to the transformations back and forth into XYZ. If all you're doing is averaging different colors of (say) bright orange, then for sufficiently similar colors the distance between each sample is going to be small enough that the curvature of the space is not going to change the answer much.

Presumably averaging over many similar images is going to be the more common use case.

jiahao · 2014-09-22T23:03:53Z

Having said that, I suspect that the error in ignoring curvature could be systematically biased. Might be worth testing.

timholy · 2014-09-23T00:23:40Z

This is beginning to sound a little like another "vectorized functions are evil" (meaning, it's not really clear what the user wants here), and I should just encourage use of mapreduce.

MarkusQ · 2014-09-26T22:26:29Z

From a purist perspective, I suspect it's even worse than @jiahao's take, since human colour perception is decidedly non-linear.

That said, it's not that uncommon for people in a wide range of fields to want to compute statistics on color values

http://iopscience.iop.org/1538-4357/615/2/L101
http://www.sciencedirect.com/science/article/pii/010956419190038Z
http://ieeexplore.ieee.org/xpls/abs_all.jsp/arnumber/1613079
etc.

timholy · 2014-09-26T22:40:21Z

Yeah, I've decided I'm going to ignore the purists' perspective on this one (aside from a likely note in the documentation).

rennis250 · 2014-09-29T12:19:32Z

Just to chime in, CIE XYZ is not the only non-curved colour space. In fact, there are quite a few (for a compendium of colour spaces, see "Color Ordered" by Rolf Kuehni). Anyway, that really doesn't change the issue here.

Actually, I'm curious @jiahao, where have you used RGB as a curved colour space? The operation to go from XYZ to RGB is a linear transformation. The curvature you are seeing in it is a by-product of the (admittedly, cheap) representations of RGB space, where every point is coloured in, and your perception imputes a curvature (the coloured representation is inaccurate because it doesn't control for simultaneous contrast effects, nor does it properly represent the influence of adaptation state). If you represent the XYZ space in the same way, you see the same phenomena. The main curved colour space in use is the sRGB space, probably followed by the LAB and LUV colour spaces. (EDIT: My previous statement about the main colour spaces in use was incorrect, as discussed below.) The question is how does changing the tristimulus values correspond to a change in the physical stimulus you are working with; in other words, is the relationship between the space and the stimulus dimensions linear or non-linear. The relationship between the stimulus and the resulting perception is another question. (Also, I'm rather suspicious of that Atlantic article. It would be nice to take a look through his image set for any biases.)

With respect to the variance of colours, in our lab, we have either gone with the spherical (or circular) standard deviation of the hue angles or the standard deviation of a colour distribution after it has been projected onto a given colour axis (all of this is done in a linear colour space). The reason for choosing one over the other has been dependent on the application, so I agree with everyone else that it's best to leave the choice to the user, for colours at least, but providing some recommendations to users would probably be helpful, since it depends on what level of encoding in the visual system you care about or if you even care about the visual system at all (c.f., the galaxy colour distribution paper that @MarkusQ linked to).

To answer the question of @StefanKarpinski, there is no inner product defined for raw tristimulus values, since basic colour matching spaces fall in the class of Affine vector spaces. There have been efforts to find transformations of these spaces to provide them with a structure that allows a sensible metric to be applied (two of these are the LAB and LUV spaces), but unfortunately, they don't achieve that goal.

Best,
Rob

timholy · 2014-09-29T13:16:42Z

I suspect RGB/sRGB confusion is at play here.

rennis250 · 2014-09-29T13:34:01Z

Ah, right, sorry about that @jiahao. Ignore the majority of that paragraph then. Thanks, @timholy.

timholy · 2014-09-29T13:37:26Z

Well, actually, your comment was very informative and helped clarify things, so thanks.

jiahao · 2014-09-29T14:06:36Z

Fair enough, I am not an expert in color theory, so I'm perfectly willing to admit the existence of other flat color spaces. I didn't know the definition of sRGB offhand, and it does look like a flat color space.

Is LCHuv flat? I don't think so - and we have definitions that assume its flatness

rennis250 · 2014-09-29T14:41:12Z

Nema problema. Hope the post came off as more over-caffeinated than anything else. :P No hard feelings meant; I just get too excited about this stuff sometimes!

But, I think the sRGB is non-linear and not flat. Base RGB assumes that your monitor has primaries that have a linear input (voltage) to output (luminance of primary) relationship (if you're working with a CRT for example, but the assumption can be generalised). However, this is not true for any monitor that I know of, so the sRGB space provides an additional non-linear encoding that saves the base RGB values in a gamma-corrected format, allowing you to send those directly to the monitor to get a linearised image (which is hopefully as accurate a reproduction of the original, imaged scene as possible). As far as I understand, the space was developed with cameras in mind, which ideally have an inverse gamma relationship to most monitors (gamma of ideal CRT = 2.2, gamma of ideal camera = 1/2.2), allowing one to use an arbitrary camera to take an image, send it to your buddy on the 'net, and have it reproduced as accurately as possible on his arbitrary monitor. My presentation here is rough however, and doesn't account for some quirks of the space at low light levels, where LAB and LUV also have similar quirks. Plus, there are perceptual benefits to this encoding scheme. There may also be other factors that influenced the gamma choices (e.g., maybe it's just cheaper to produce electronics with these response characteristics), but I never bothered to look into that. Anyway, this goes against my statement that LAB and LUV are the two main non-linear spaces. By far, sRGB is the main one in use. LAB and LUV probably follow second, at least based on my experiences in colour research.

I would be very happy for an industrial colour person to correct me here though, since all of my work involves first undoing all of the automatic corrections that sRGB, monitors, and colour management systems perform, so I've never become too intimate with them. :P Doing some additional reading now and I will report back if I've completely botched my understanding and explanation here.

If I understand correctly, the LCHuv space is just a cylindrical parametrisation of the underlying LUV space? If so, then yes, I would say that it is not flat.

I really need to start getting my hands dirty in the Color.jl package. There's plenty of fun to be had. :)

StefanKarpinski · 2014-09-29T15:20:17Z

@rennis250, it would be great to have some more expert input into the package. It seems like there's no obviously correct way to do linear operations on colors. Given that, one option is just not to define such operations. However, I suspect that's going to prove quite annoying. Instead we can just define the linear operations in the naive obvious way but make more sophisticated mechanisms available as well.

jiahao · 2014-09-29T16:02:17Z

Ok, let's not hijack this thread further for color discussions. We can do it in JuliaAttic/Color.jl#64

stevengj · 2015-01-28T20:35:46Z

In #4039, I fixed var for complex matrices and I added a test case, but it looks like it got broken again when the version for arbitrary iterables was added (since that case was not tested)?

@jiahao, I don't understand why you think the Wikipedia article is wrong when it defines the covariance matrix of a complex vector (and hence the variance of a complex scalar). That definition is totally standard as far as I can tell, and is the only reasonable definition if you want to do the usual algebra things (SVD etc) with the covariance matrix. Julia should follow it.

(More generally, similar to what @johnmyleswhite wrote above, it seems like the variance could be defined for arbitrary Banach spaces as ⟨|v - ⟨v⟩|²⟩, where |...| is a norm and ⟨...⟩ is expectation. For real and complex numbers over the complex field, there is only one norm, up to an overall scale factor, so the variance is uniquely defined. For other vector fields, there are many choices of norms, but of course we could allow a norm to be passed as a parameter. I've never heard of anyone defining a scalar variance that did not correspond to a norm², have you? I would think that matrix-valued generalizations, i.e. the covariance matrix and friends, should be a different function than var.)

oscardssmith · 2018-02-16T23:06:46Z

Do the recent changes to var and std make this not an issue?

fredrikekre · 2018-02-16T23:15:25Z

julia> std(Any[1,2,3])
1.0

julia> var(Any[1,2,3])
1.0

jakebolewski changed the title ~~var and std do not work for Any arrays~~ var and std do not work for Any[] Sep 12, 2014

ivarne added the help wanted Indicates that a maintainer wants help on an issue or pull request label Sep 15, 2014

jiahao mentioned this issue Sep 21, 2014

varm is too strongly typed #8434

Closed

timholy self-assigned this Sep 22, 2014

ihnorton added needs decision A decision on this change is needed and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Dec 13, 2014

ihnorton mentioned this issue Dec 13, 2014

maximum of Any array #9276

Closed

andreasnoack mentioned this issue Jan 28, 2015

var(A) does not work for Any arrays #9949

Closed

jiahao mentioned this issue Mar 24, 2015

generic matmul depends on zero JuliaLang/LinearAlgebra.jl#194

Closed

jakebolewski added the bug Indicates an unexpected problem or unintended behavior label May 29, 2015

timholy mentioned this issue Aug 9, 2015

Fixing Coloramity JuliaAttic/Color.jl#101

Closed

timholy referenced this issue in JuliaLang/METADATA.jl Aug 9, 2015

Tag Compat v0.5.1

778fac8

stevengj closed this as completed Feb 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

var and std do not work for Any[] #8319

var and std do not work for Any[] #8319

jakebolewski commented Sep 12, 2014

eschnett commented Sep 12, 2014

timholy commented Sep 21, 2014

andreasnoack commented Sep 21, 2014

timholy commented Sep 21, 2014

andreasnoack commented Sep 21, 2014

timholy commented Sep 21, 2014

jakebolewski commented Sep 22, 2014

StefanKarpinski commented Sep 22, 2014

johnmyleswhite commented Sep 22, 2014

eschnett commented Sep 22, 2014

timholy commented Sep 22, 2014

StefanKarpinski commented Sep 22, 2014

timholy commented Sep 22, 2014

jiahao commented Sep 22, 2014

jiahao commented Sep 22, 2014

timholy commented Sep 22, 2014

jiahao commented Sep 22, 2014

jiahao commented Sep 22, 2014

timholy commented Sep 23, 2014

MarkusQ commented Sep 26, 2014

timholy commented Sep 26, 2014

rennis250 commented Sep 29, 2014

timholy commented Sep 29, 2014

rennis250 commented Sep 29, 2014

timholy commented Sep 29, 2014

jiahao commented Sep 29, 2014

rennis250 commented Sep 29, 2014

StefanKarpinski commented Sep 29, 2014

jiahao commented Sep 29, 2014

stevengj commented Jan 28, 2015

oscardssmith commented Feb 16, 2018

fredrikekre commented Feb 16, 2018

var and std do not work for Any[] #8319

var and std do not work for Any[] #8319

Comments

jakebolewski commented Sep 12, 2014

eschnett commented Sep 12, 2014

timholy commented Sep 21, 2014

andreasnoack commented Sep 21, 2014

timholy commented Sep 21, 2014

andreasnoack commented Sep 21, 2014

timholy commented Sep 21, 2014

jakebolewski commented Sep 22, 2014

StefanKarpinski commented Sep 22, 2014

johnmyleswhite commented Sep 22, 2014

eschnett commented Sep 22, 2014

timholy commented Sep 22, 2014

StefanKarpinski commented Sep 22, 2014

timholy commented Sep 22, 2014

jiahao commented Sep 22, 2014

jiahao commented Sep 22, 2014

timholy commented Sep 22, 2014

jiahao commented Sep 22, 2014

jiahao commented Sep 22, 2014

timholy commented Sep 23, 2014

MarkusQ commented Sep 26, 2014

timholy commented Sep 26, 2014

rennis250 commented Sep 29, 2014

timholy commented Sep 29, 2014

rennis250 commented Sep 29, 2014

timholy commented Sep 29, 2014

jiahao commented Sep 29, 2014

rennis250 commented Sep 29, 2014

StefanKarpinski commented Sep 29, 2014

jiahao commented Sep 29, 2014

stevengj commented Jan 28, 2015

oscardssmith commented Feb 16, 2018

fredrikekre commented Feb 16, 2018