Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to apply Level of Measurement? #131

Closed
keilw opened this issue Oct 20, 2018 · 29 comments
Closed

Where to apply Level of Measurement? #131

keilw opened this issue Oct 20, 2018 · 29 comments

Comments

@keilw
Copy link
Member

keilw commented Oct 20, 2018

Either in Unit or Quantity the new LevelOfMeasurement attribute should be applied for arithmetic decision making.

Needs #130

@keilw
Copy link
Member Author

keilw commented Oct 23, 2018

@unitsofmeasurement/experts, @unitsofmeasurement/contributors Based on discussions and some code snippets (e.g. by @desruisseaux back in June) from #95 I take, that Unit would be the best place for a LevelOfMeasurement attribute / method like getLevel()?

@desruisseaux
Copy link
Contributor

I think it should rather be in Quantity. I'm not sure that conversion from "interval °C" to "ratio °C" make sense for example (i.e. I do not have any use case in mind where we would want to convert from Units.CELSIUS to Units.CELSIUS_INTERVAL). Instead, the use cases that I see are conversions that preserve the level of measurement of the quantity. For example:

  • If we have an interval of 2 K that we want to convert to degrees Celsius, the result is an interval of 2°C.
  • If we have a value of 2 K on the ratio scale that we want to convert to degrees Kelvin, the result is a value of -271.15°C on "ratio" scale ("ratio" may not be the right word in this case…).

We want Quantity.to(Units.CELSIUS) to preserve automatically the level of measurement of the quantity. We do not want to force user to check the Quantity unit in order to determine if (s)he should specify Units.CELSIUS or Units.CELSIUS_INTERVAL in argument to the to method.

Level of measurement of a Quantity are not changed by unit conversions, but by arithmetic operations applied between two quantities. For example "ratio" - "ratio" = "interval".

@keilw
Copy link
Member Author

keilw commented Oct 23, 2018

Well "describes the nature of information within the values assigned to variables." from the Wikipedia article sounds slightly in that direction. The question then is, where does it have to be set that is least intrusive? On the Unit a natural place would have been when defining e.g. CELSIUS, but if there's a valid use case for having -271.15°C as RATIO and the same as INTERVAL, if that is the case, we may have to find a different place but we could not do it without assuming a default level when none is provided. Or can it always be derived through operations?

This example from June 15 was under the assumption, it was more beneficial on the Unit:

Quantity add(Quantity that) {
    Unit u1 = this.getUnit();
    Unit u2 = that.getUnit();
    Unit uc = u1.level(u2.getLevel());    // We will convert in unit of u1, but taking in account the nature of u2 (quantity or increment).
    UnitConverter c = u2.getConverterTo(uc);
    newValue = this.getValue() + c.convert(that.getValue());

    // The result is in unit of u1, but is it an absolute value or an increment?
    // Note: following code could be factorized in a convenience method.
    boolean isIncrement1 = u1.getLevel() == LevelOfMeasurement.INTERVAL;
    boolean isIncrement2 = u2.getLevel() == LevelOfMeasurement.INTERVAL;
    boolean isResultAnIncrement = u1 & u2;
    Unit uf = u1.level(isResultAnIncrement ? LevelOfMeasurement.INTERVAL
                                           : LevelOfMeasurement.RATIO);

    return new Quantity(newValue, uf);
}

I only changed the enum name, otherwise it is like the one from June. So even with pseudocode, where is a level set? The Wikipedia description of the different levels also says "Most measurement in the physical sciences and engineering is done on ratio scales. ", therefore it would be a hassle and not acceptable having to do something like Quantities.getQuantity(10, KILOGRAM, RATIO) every time, if that was a place where it had to be explicitly set.

https://en.wikipedia.org/wiki/Level_of_measurement#Ratio_scale states, "The Kelvin temperature scale is a ratio scale because it has a unique, non-arbitrary zero point called absolute zero." So Quantities.getQuantity(10, KELVIN, INTERVAL) seems to make no sense.
Or are you saying 10 K - 1 K or 100 m - 5 m automatically turns their level into INTERVAL?

https://www.isi.edu/~ulf/amr/lib/popup/quantity-types.html

@desruisseaux
Copy link
Contributor

The idea was that 10 K - 1 K automatically turns their level into INTERVAL. But I need more though; on one side it is true that an ORDINAL level of measurement for example applies better to Unit than Quantity. But for the particular case that we were trying to solve in #95 (i.e. the result of 1°C + 2°C), having the information associated to Quantity instead than Unit allow more convenient conversions as described in my previous comment. I need to think more about that…

@keilw
Copy link
Member Author

keilw commented Oct 24, 2018

Ok, we could also do a vote, probably running a bit longer than the one for the name of a new type, but it should not take quite as long as #95 itself ;-)

@desruisseaux
Copy link
Contributor

To me, vote should start only after we are done analyzing the problem, listing the choices and debated pros and cons…

@keilw keilw changed the title Apply Level of Measurement Apply Level of Measurement? Oct 24, 2018
@keilw keilw added the question label Oct 24, 2018
@keilw
Copy link
Member Author

keilw commented Oct 24, 2018

I would have hoped much of this was done in #95 but no problem doing it in this ticket although it was meant as an action item. I changed it to a question. @unitsofmeasurement/experts, @unitsofmeasurement/contributors or @unitsofmeasurement/observers please (at least after the busy conference week) share your thoughts and preferences whether the Unit or Quantity should be used to apply the new LevelOfMeasurement attribute, or something else like UnitConverter, although literature mostly points to either the unit or quantity (sometimes in a slightly different context also called Measure or Measurement) This is a good overview: http://www.indiana.edu/~educy520/sec5982/week_3/measurement_rsm.pdf
Here is another source: https://math.tutorvista.com/statistics/scales-of-measurement.html

@desruisseaux
Copy link
Contributor

One difficulty is that the debate on #95 and elsewhere is exploded in many comments, which make difficult to get the big picture. I would like a wiki page summarizing the current situation: what is resolved, what still need to be resolved, what are the alternatives with pros and cons. The difference with issue tracker is that agreement result in the wiki page being updated and kept short, as opposed to a comments added in a long, tedious to follow, thread.

@keilw
Copy link
Member Author

keilw commented Oct 24, 2018

If it's just for decision making, then issues like this one are just like a Wiki, too. And after an API decision was made, that information is normally not needed any more. Creating a Wiki that helps downstream users and projects to make use of those new features, sure, we can't have enough of that, so please let us not just put arguments for a particular vote or decision into a Wiki where it has little value later on. I spoke to @kaikreuzer at Eclipse IoT WG meeting on Monday. And he confirmed, they have a workaround right now in SmartHome. Real life experience from a project like theirs is also welcome. They certainly won't change the API but the way they decide how to calculate things differently should help to inspire the standard so it's useful to their and other solutions.
There is of course a Wiki page here: https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity so if you could make example cases there to refine the problem based on the newly created LevelOfMeasurement, that would be a good enhancement. Creating another page just for this ticket may be a bit confusing. https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity#8-how-to-reduce-surprises-for-users already hints on this new feature, so it could be added as a new paragraph there. Adding a whole new page, not sure, if that adds value, maybe start there, if the number of arguments and options became too many it could always be refactored into a separate page.

@keilw
Copy link
Member Author

keilw commented Oct 24, 2018

This article http://psych.colorado.edu/~carey/courses/psyc5741/handouts/Measurement%20Scales.pdf is quite explicit, e.g.

Units of time (msec, hours), distance and length (cm, kilometers), weight (mg, kilos), and volume
(cc) are all ratio scales.

Also interesting on the Interval scale (level)

As a result, one can add and subtract values on an interval scale, but one cannot multiply or divide units

Therefore it would be an issue if 100 m - 5 m suddenly became INTERVAL and one could no longer multiply or divide the 95 m by another quantity.

This article refers to the level as MeasurementScale btw, which would be an alternate name for that enum. It is fairly common, but with 606.000 Google for the exact term "measurement level" (in quotes) compared to 22.800.000 for "level of measurement", I guess we don't have to revisit or reopen #130 (unless there are serious objections) and stick to the most common term also used by the Wikipedia page.

https://www.questionpro.com/blog/nominal-ordinal-interval-ratio/
and https://www.questionpro.com/blog/ratio-scale-vs-interval-scale/
also provide great explanations, examples and comparison between two of them (INTERVAL and RATIO are the most common to use with numeric values), but none of them show evidence, that CELSIUS or FAHRENHEIT could be both RATIO and INTERVAL.
Those who know such requirements, cases or sources please quote them here.

@desruisseaux
Copy link
Contributor

Let try to summarize:

  • The same unit of measurement can be used with different scales. For example distance in metres can be either on a ratio scale or an interval, depending on the context (see next point).
  • The difference between 2 measurements is an interval (also called a difference). A height of 6 meters minus a height of 5 meters is an interval (or a difference) of one meter. The metre unit can also be associated to intervals, not only to ratio scale.
  • We are still allowed to apply some multiplications or divisions on intervals; they just have different meanings. Multiplying a mass measurement by 2 results in a mass twice heavier. Multiplying a mass difference by 2 results in a mass gain or mass loss twice greater. From Wikipedia: "However, ratios of differences can be expressed; for example, one difference can be twice another."
  • For most units (length, mass, etc.), above discussion on ratio versus interval have no practical impacts. So even if some lengths in metres are ratios and other lengths in metres are intervals, users see no difference. Practical impact happen only for a few units like Celsius and density sigma-p.
  • For the purpose of the problem we are tying to solve - Clarify behavior of Quantity arithmetic on shifted units #95 - we only need to distinguish between measurements and intervals: 2°C = 275.15 K is a measurement, while 2°C = 2 K is an interval.
  • I'm not sure that "ratio scale" is the appropriate term for the "2°C = 275.15 K" case, because qualifying 2°C as a ratio scale does not feel right.

My conclusion (for now): Unit seems a natural place for LevelOfMeasurement: gender is NOMINAL, Beaufort wind scale is ORDINAL, Celsius degree is INTERVAL and Kelvin is RATIO. However while useful, level of measurements used that way do not resolve well the #95 problem: even if Celsius degrees is an INTERVAL units, it can be used for both measurements and intervals. Attempt to distinguish those two cases with two different CELSIUS units would force us to define a "Celsius as ratio scale" unit, which seems wrong. I think we rather need a property in Quantity for telling us whether the quantity is a measurement or an interval, keeping in mind that:

  • an interval quantity does not imply the use of an interval unit (an interval can be expressed in metres);
  • a measurement quantity does not imply the use of a ratio unit (a measurement can be expressed in °C).

So I think that LevelOfMeasurement in Unit and "measurement or interval" property in Quantity are complementary, and that for fixing #95 the important one is the later.

@dautelle
Copy link

I would suggest not changing current units definition ( "scaled dimensions") currently supporting different physical models (e.g relativistic). But to add the "level of measurement" property to the quantity/measurement itself.
Make sense as the name "level of measurement" indicates :)

@desruisseaux
Copy link
Contributor

Yes, I agree that for #95 purpose the information is more useful in Quantity. My issue is that in such case, "ratio" may not be an appropriate name for a measurement in °C. In other words, I think that LevelOfMeasurement fits well in Unit but is not exactly what we need for #95.

@dautelle
Copy link

Hello Martin, what do you mean by ”fits well in Units”?

@desruisseaux
Copy link
Contributor

desruisseaux commented Oct 27, 2018

Each unit can be associated to exactly one level of measurement. A Beaufort wind scale unit can be associated to ORDINAL level of measurement. Celsius unit can be associated to INTERVAL, most other units can be associated to RATIO, etc.

We may consider that the level of measurement of a unit does not change. A "Beaufort wind scale" unit can not be upgraded from ORDINAL to INTERVAL for example, because an increase in wind speed from Beaufort number 2 to number 4 is not twice the increase in wind speed from Beaufort number 2 to number 3. Celsius unit can not be upgraded from INTERVAL to RATIO because the amount of heat at 4°C is not twice the amount of heat at 2°C. So we can see LevelOfMeasurement as a useful Unit property, but with a fixed value for each unit. This is nice and clean, but implementors are already capable to get equivalent information with the current API: if unit.getConverterTo(unit.getSystemUnit()).isLinear() returns true, then the level of measurement is RATIO; if false, then the level of measurement is something else, possibly INTERVAL. So LevelOfMeasurement in Unit fit well the definitions that we can see in Wikipedia and other web site, but does not help much for #95 resolution.

If we put LevelOfMeasurement in Quantity, then we have the capability to instantiate two Quantity with the same units but different answer to the "is it a measurement or an interval" question. This is exactly what we need for #95. But then my problem is: how do we specify that 4°C is a measurement? Do we create a Quantity with RATIO level of measurement? The problem is that 4°C is not twice the amount of heat of 2°C, so it does not fit the definition of ratio. Conversely if a user wants to compute the difference between two Beaufort numbers, what would the LevelOfMeasurement of the result? It is not an INTERVAL for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer an ORDINALneither.

For resolving #95, we need to distinguish between measurements and intervals. But an interval quantity does not automatically implies LevelOfMeasurement.INTERVAL (Beaufort wind scale example), and conversely a measurement quantity does not automatically implies LevelOfMeasurement.RATIO (the Celsius example). Unit level of measurement and Quantity "measurement or interval" characteristics are closely related, but not the same. I think they are complementary (but only the later is strictly necessary for #95).

@keilw
Copy link
Member Author

keilw commented Oct 28, 2018

If LevelOfMeasurement may not be used for Quantity then why did we introduce it? There seems no need other than improving operations discussed in #95.

we need to distinguish between measurements and intervals

The term Measurement is already taken at least in the RI and other unit frameworks e.g. in F# call what we defined as Quantity Measure or Measurement.

If something really has to be added to Quantity then only there, and let's forget about LevelOfMeasurement. However, even the term "Interval Quantity" means something entirely different: https://www.dummies.com/art-center/music/music-theory-harmonic-and-melodic-intervals/

So what should we define where? And how does it benefit those special cases like CELSIUS or FAHRENHEIT while being unintrusive in all other cases. @kaikreuzer @htreu any hint, what you used in SmartHome? Did you primarily check for special units like those? If we cannot simply use the "level" of the Unit of each Quantity then we probably are back to something like Data Type or MeasurementType.

@desruisseaux
Copy link
Contributor

This is why I wanted a wiki page. Long threads in issue tracker does not help to see the big picture.

#95 identifies the cause of arithmetic inconsistencies. The outcome is that we need to distinguish between Quantity that are measurements and Quantity that are intervals.

This issue is about how to make the distinction needed for #95, which is a slightly different topic than identifying that this distinction was needed. The analysis work is not the same.

I'm neutral on whether we should add LevelOfMeasurement in Unit or not. My suggestion is that for the purpose of #95, we need only two enumeration values in Quantity: MEASUREMENT and INTERVAL (or other names if there is suggestion), and that those values are not the same than RATIO and INTERVAL levels of measurement, even if closely related.

@keilw
Copy link
Member Author

keilw commented Oct 28, 2018

If it helps, and has no other dependencies, it could be best to add a static enum like Type, DataType or similar directly to Quantity. I see no problem with INTERVAL but it would really be easy to confuse with the proposed LevelOfMeasurement entry (which has been defined that way by literature and experts for many decades) Where are the sources that describe the difference e.g. Wikipedia or a similar article? If we defined something like that we mustn't point to our own Wiki page, we should have an official reference, whether it's Wikipedia or a specialized forum, but something that is free and safe to quote.

Since both Unit and Quantity already use asType() with a Class argument, the method should not be getType() but something else, maybegetDataType()or getDatatype().

@keilw
Copy link
Member Author

keilw commented Oct 28, 2018

Conversely if a user wants to compute the difference between two Beaufort numbers, what would the LevelOfMeasurement of the result? It is not an INTERVAL for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer an ORDINAL either.

@desruisseaux Then what is it in this case? If we stick to LevelOfMeasurement unless we add some other levels we must not have an "unknown" or null level, that would be rather bad.

@desruisseaux
Copy link
Contributor

It is not an interval as defined by Stevens Level of Measurement. But it can be an interval as we define for a different context. The same English words have different meaning depending on the context; this is why ISO standards, Ph.D. studies, etc. begin with a definition of terms they are going to use. Or if we really feel that it may be a cause of confusion, we may call it DIFFERENCE.

@keilw
Copy link
Member Author

keilw commented Oct 29, 2018

But what would you call the other one, VALUE? MEASUREMENT is quite confusing, there is plenty of literature that differentiates between RATIO and INTERVAL both being MEASUREMENT.
SPSS actually has an enum called MeasurementLevel(!) https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/dataedit_define_variable_measurement.html, but it makes no difference between RATIO and INTERVAL either, it calles that level SCALE. What it does seem to do is assigning that MeasurementLevel to a particular data entry, which would be closer to our Quantity. I found another library on MavenCentral with an enum literally called LevelOfMeasurement. I have to check out the source JAR and see, how it fits into their API and what they do with it. The JAR contains other elements like BaseUnit or SIUnits, therefore it looks like the level may be used on a Unit, but I can't say until I saw the code in more detail. Having both even if we invented terms no other piece of software uses, looks like an overhead and source for confusion.

As the two JCP EC Members who supported this effort ever since JSR 275 (IBM and Red Hat) are soon going to be one ;-) and at least via SPSS IBM already uses this term, I guess we should try to also ask them (maybe not the actual EC reps, but they should know someone from the SPSS team) for advise.

@desruisseaux
Copy link
Contributor

Agree for trying to find terms used by the literature - this is the purpose of this issue. But we have to use the right definitions for the purpose we are trying to fix, which was the intent of my comment.

@keilw keilw added in progress and removed ready labels Oct 29, 2018
@keilw
Copy link
Member Author

keilw commented Oct 29, 2018

Here are 3 sources, one of them actually an (incubating) Apache Project which targets Machine Learning and Big Data:

While both SPSS and SystemML summarize INTERVAL and RATIO under a common level called SCALE, Purifinity is closest to the definition we used so far.

Beside that, all of them have one thing in common, they apply this measurement level to a data point or metadata used to describe a measurement, not the actual unit.

This is another example for Spatial Data, should be familiar especially to @desruisseaux
https://www.e-education.psu.edu/geog160/c3_p8.html
It does not talk about an API, but "An implication of this difference is that a quantity of 20 measured at the ratio scale is twice the value of 10" also sounds quite clear about the scale meant to be on a Quantity, so even if we ended up sticking to RATIO and INTERVAL only (IMO to support use cases like Big Data, Statistics and others we should still keep the 4 Stevens definitions) we should probably do this on Quantity.

@keilw
Copy link
Member Author

keilw commented Oct 30, 2018

Thanks @dautelle, @desruisseaux for the constructive input. I am not sure, if we still need any Wiki page. It is not written in stone, even the name, although the one we picked or Purifinity matches the most common phrase in literature, so it seems fine.
@andi-huber, @filipvanlaenen and others, do you feel issues like unitsofmeasurement/indriya#128 can be worked on based on the current assumption Quantity has a getValue() that can be either RATIO (the default for now because @desruisseaux also said, it's the default 1.0 behavior) or INTERVAL, or others where appropriate?

@keilw
Copy link
Member Author

keilw commented Oct 31, 2018

Based on only a small selection of Java or other APIs that apply the level, all of which doing so to a Quantity, "measurement" or "data" (not a Unit), I would like to resolve this. Should anybody come to a serious problem, we may revisit it, but the projects and Java APIs that use this concept would make it difficult to interact and exchange data with, if we did this much different.

@keilw keilw closed this as completed Oct 31, 2018
@keilw keilw removed the in progress label Oct 31, 2018
@desruisseaux
Copy link
Contributor

@keilw: this issue has been close early again without evidence that it has been understood. Citing what other projects do does not help. There is no question that Stevens's LevelOfMeasurement with nominal, ordinal, interval and ratio values are widely accepted. This is not the issue I was raising. The issue I was raising is that what we need for #95 may not be LevelOfMeasurement. Do we have an answer to the two questions I asked before?

  • What is a measurement (not interval) in °C? If your answer is RATIO, how do you conciliate with Steven's definition of ratio level of measurement?
  • Conversely what is a difference between two ordinal values? If your answer is INTERVAL, then again how do you conciliate with Steven's definition of interval level of measurement?

@desruisseaux desruisseaux reopened this Nov 1, 2018
@keilw keilw changed the title Apply Level of Measurement? Where to apply Level of Measurement? Nov 1, 2018
@keilw
Copy link
Member Author

keilw commented Nov 1, 2018

There are so few projects or APIs (dealing with measurements, not just in Java) that even care about it, and still being used a lot. Those who do all apply levels that are similar to Steven's definition but e.g. SPSS does not care, if °C was an interval or not at all, it's just SCALE there. Putting it on a Unit would be wrong. I take those who bothered to discuss it so far agreed with what others do. IBM SPSS Statistics, "the world’s leading statistical software" and other approaches for Data Science, Machine Learning or Statistic (including Java support) would have done this in a different place. That IMO is the goal of this issue. And it was answered. Asking whether those 4 levels are too many or not enough, please create a new ticket for that, so we don't have super-tickets like #95. I created #138 as a placeholder, please fill it with relevant parts. I did not see any indication, that we should have TWO places or two levels to apply, especially because having one contradict the other would lead to utter confusion. This ticket helped find ONE place (from all evidence both here and elsewhere Quantity turned out to be the better place)

@desruisseaux
Copy link
Contributor

We agree to put the information in Quantity and I'm not asking to put it in two places. I'm not debating neither whether there is too many levels or not enough. I'm questioning whether LevelOfMeasurement as defined by Steven can address the needs of #95. While I like Steven's level of measurement a lot I would be happy to see them in the API, unfortunately I think it does not address #95 needs. Citing other software like IBM SPSS just because the words "levels of measurement" appear in their documentation does not help - we have to understand what they are using levels of measurement for and see if we are in the same situation.

Please lets focus on just two levels: INTERVAL and RATIO. Forget everything else for now. The two questions are:

  • Is it correct to qualify 5°C as a measurement on the RATIO scale, with "ratio" as defined by Steven?
  • Is it correct to qualify the difference between two ordinal value as a measurement on the INTERVAL scale, with "interval" as defined by Steven?

My answer is that no - again I love Steven's Level of Measurement definitions, but they do not apply to what we are trying to do for solving #95.

We can not said that we don't care. Being able to differentiate "interval" from "measurement" (replace "measurement" by whatever other name you like) is the critical part we need for resolving #95.

@keilw
Copy link
Member Author

keilw commented Nov 2, 2018

This one simply helped finding the best place for the level, so please continue in #138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants