-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
var and std do not work for Any[] #8319
Comments
Since var (and std) do not work for empty iterables anyway, the loop in "var" can be unrolled by one to avoid the need for calling zero. |
Just to state the obvious, it's not limited to just |
I like generic code, but how do you define the variance of an array of arrays? |
I guess it only makes sense for Vectors, not general arrays, where the obvious definition is the covariance. It's also pretty clear what you want in the case of |
Yes, variance of a vector of vectors makes sense. What is |
It's an abstract type defined in the |
Does it make sense to define this for all possible numeric types? What does the variance of a complex number even mean? I guess this is the tension with allowing the definition to be completely generic. |
According to Wikipedia, the variance of a complex random variable does make sense and is defined as |
I think variance is well-defined whenever there's an L2 norm and a definition of expectation. |
As I mention above, the code can be easily rewritten to not call The current code expects there to be a neutral element ( |
It's not just the |
What's the inner product on colors? |
In that case, the user "clearly" wants elementwise. It's basically the diagonal of the covariance=outer product (I would actually say it needs the outer product, not the inner product). |
From a strict, purely mathematical perspective, it is not possible to define a unique generalization of variance to objects with more than one component because there is no unique way to generalize the notion of "squaring" the random variable. So I would lean heavily in favor of defining var, std and the like only for real variables and let users define other methods suitable for their applications. (Long rambling discussion warning) For complex numbers the Wikipedia definition, taken literally, is either wrong or incomplete. You could define the variance of z as literally the expectation value E[ (z - E(z)) (z - E(z))* ], and the transposition on a scalar is trivial; this would generate a scalar not a matrix. However if we associate with Similarly for vector-valued quantities it is a mistake to assume that the covariance is the only generalization of the variance; you can also have the "inner product" version |
With regard to colors specifically, I'm concerned that applications taking means and variances of color objects may not be taking care of the underlying curvature of the space. (Which is yet another generalization to variance to something like Presumably it is desirable for the results of arithmetic on colors to be independent of the working color space. If so, the issue that arises is that the only flat color space is XYZ (i.e. XYZ is a linear vector space), so addition and multiplication in XYZ does not require additional curvature corrections, but they would be required for all other color spaces, which are not linear vector spaces, but manifolds with nontrivial curvature. Otherwise, you would get that taking the mean/variance does not commute with the convert operation going between different color spaces. Presumably ignoring the curvature of the RGB color space is why one gets phenomena like orange being the average color of every picture on the Internet. |
Good points, @jiahao. I hadn't really even thought of defining all these operations only for XYZ colorspace, but technically you're right. OTOH, I really worry that people will be annoyed if taking the mean over an RGB image is slow because it requires two matrix multiplications per pixel. Not really sure what to do here. |
Perhaps you can define some tolerance in terms of color differences within which you wouldn't bother to do to the transformations back and forth into XYZ. If all you're doing is averaging different colors of (say) bright orange, then for sufficiently similar colors the distance between each sample is going to be small enough that the curvature of the space is not going to change the answer much. Presumably averaging over many similar images is going to be the more common use case. |
Having said that, I suspect that the error in ignoring curvature could be systematically biased. Might be worth testing. |
This is beginning to sound a little like another "vectorized functions are evil" (meaning, it's not really clear what the user wants here), and I should just encourage use of |
From a purist perspective, I suspect it's even worse than @jiahao's take, since human colour perception is decidedly non-linear. That said, it's not that uncommon for people in a wide range of fields to want to compute statistics on color values http://iopscience.iop.org/1538-4357/615/2/L101 |
Yeah, I've decided I'm going to ignore the purists' perspective on this one (aside from a likely note in the documentation). |
Just to chime in, CIE XYZ is not the only non-curved colour space. In fact, there are quite a few (for a compendium of colour spaces, see "Color Ordered" by Rolf Kuehni). Anyway, that really doesn't change the issue here. Actually, I'm curious @jiahao, where have you used RGB as a curved colour space? The operation to go from XYZ to RGB is a linear transformation. The curvature you are seeing in it is a by-product of the (admittedly, cheap) representations of RGB space, where every point is coloured in, and your perception imputes a curvature (the coloured representation is inaccurate because it doesn't control for simultaneous contrast effects, nor does it properly represent the influence of adaptation state). If you represent the XYZ space in the same way, you see the same phenomena. The main curved colour space in use is the sRGB space, probably followed by the LAB and LUV colour spaces. (EDIT: My previous statement about the main colour spaces in use was incorrect, as discussed below.) The question is how does changing the tristimulus values correspond to a change in the physical stimulus you are working with; in other words, is the relationship between the space and the stimulus dimensions linear or non-linear. The relationship between the stimulus and the resulting perception is another question. (Also, I'm rather suspicious of that Atlantic article. It would be nice to take a look through his image set for any biases.) With respect to the variance of colours, in our lab, we have either gone with the spherical (or circular) standard deviation of the hue angles or the standard deviation of a colour distribution after it has been projected onto a given colour axis (all of this is done in a linear colour space). The reason for choosing one over the other has been dependent on the application, so I agree with everyone else that it's best to leave the choice to the user, for colours at least, but providing some recommendations to users would probably be helpful, since it depends on what level of encoding in the visual system you care about or if you even care about the visual system at all (c.f., the galaxy colour distribution paper that @MarkusQ linked to). To answer the question of @StefanKarpinski, there is no inner product defined for raw tristimulus values, since basic colour matching spaces fall in the class of Affine vector spaces. There have been efforts to find transformations of these spaces to provide them with a structure that allows a sensible metric to be applied (two of these are the LAB and LUV spaces), but unfortunately, they don't achieve that goal. Best, |
I suspect RGB/sRGB confusion is at play here. |
Well, actually, your comment was very informative and helped clarify things, so thanks. |
Fair enough, I am not an expert in color theory, so I'm perfectly willing to admit the existence of other flat color spaces. I didn't know the definition of sRGB offhand, and it does look like a flat color space. Is LCHuv flat? I don't think so - and we have definitions that assume its flatness |
Nema problema. Hope the post came off as more over-caffeinated than anything else. :P No hard feelings meant; I just get too excited about this stuff sometimes! But, I think the sRGB is non-linear and not flat. Base RGB assumes that your monitor has primaries that have a linear input (voltage) to output (luminance of primary) relationship (if you're working with a CRT for example, but the assumption can be generalised). However, this is not true for any monitor that I know of, so the sRGB space provides an additional non-linear encoding that saves the base RGB values in a gamma-corrected format, allowing you to send those directly to the monitor to get a linearised image (which is hopefully as accurate a reproduction of the original, imaged scene as possible). As far as I understand, the space was developed with cameras in mind, which ideally have an inverse gamma relationship to most monitors (gamma of ideal CRT = 2.2, gamma of ideal camera = 1/2.2), allowing one to use an arbitrary camera to take an image, send it to your buddy on the 'net, and have it reproduced as accurately as possible on his arbitrary monitor. My presentation here is rough however, and doesn't account for some quirks of the space at low light levels, where LAB and LUV also have similar quirks. Plus, there are perceptual benefits to this encoding scheme. There may also be other factors that influenced the gamma choices (e.g., maybe it's just cheaper to produce electronics with these response characteristics), but I never bothered to look into that. Anyway, this goes against my statement that LAB and LUV are the two main non-linear spaces. By far, sRGB is the main one in use. LAB and LUV probably follow second, at least based on my experiences in colour research. I would be very happy for an industrial colour person to correct me here though, since all of my work involves first undoing all of the automatic corrections that sRGB, monitors, and colour management systems perform, so I've never become too intimate with them. :P Doing some additional reading now and I will report back if I've completely botched my understanding and explanation here. If I understand correctly, the LCHuv space is just a cylindrical parametrisation of the underlying LUV space? If so, then yes, I would say that it is not flat. I really need to start getting my hands dirty in the Color.jl package. There's plenty of fun to be had. :) |
@rennis250, it would be great to have some more expert input into the package. It seems like there's no obviously correct way to do linear operations on colors. Given that, one option is just not to define such operations. However, I suspect that's going to prove quite annoying. Instead we can just define the linear operations in the naive obvious way but make more sophisticated mechanisms available as well. |
Ok, let's not hijack this thread further for color discussions. We can do it in JuliaAttic/Color.jl#64 |
In #4039, I fixed @jiahao, I don't understand why you think the Wikipedia article is wrong when it defines the covariance matrix of a complex vector (and hence the variance of a complex scalar). That definition is totally standard as far as I can tell, and is the only reasonable definition if you want to do the usual algebra things (SVD etc) with the covariance matrix. Julia should follow it. (More generally, similar to what @johnmyleswhite wrote above, it seems like the variance could be defined for arbitrary Banach spaces as ⟨|v - ⟨v⟩|²⟩, where |...| is a norm and ⟨...⟩ is expectation. For real and complex numbers over the complex field, there is only one norm, up to an overall scale factor, so the variance is uniquely defined. For other vector fields, there are many choices of norms, but of course we could allow a |
Do the recent changes to |
julia> std(Any[1,2,3])
1.0
julia> var(Any[1,2,3])
1.0 |
All other statistic functions in Base work work for
Any
arrays of numeric values.The text was updated successfully, but these errors were encountered: