-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP:hr_to_mr and mr_to_hr functions (closes #54) #79
base: main
Are you sure you want to change the base?
Conversation
@sbillinge: @maak-sdu and I added the hr_to_mr function which passes the corresponding test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Few comments.
pydatarecognition/utils.py
Outdated
@@ -270,5 +270,18 @@ def mr_to_hr(number, esd): | |||
in the following format: "343.44(45)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before reviewing the code we should get the docstring right.
Here are my issues:
- The correct way to do this:
(e.g. 343.44 and 0.45) into human readable
numbers with estimated standard deviations (e.g. 343.44(45))
would actually be this:
(e.g. 343.44 and 0.45) into human readable
numbers with estimated standard deviations (e.g. 343.4(5))
unless the esd was between 0.1 and 0.14 in which case it would be
(e.g. 343.44 and 0.14) into human readable
numbers with estimated standard deviations (e.g. 343.44(14))
- we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring.
- The function name is actually the name of our entire project, not this little utility function (machine readable literature!). We should use a name that is easily readable and captures what the function does.
- For the descriptions we usually follow the numpy standards:
Parameters
----------
param1 : int
The first parameter.
param2 : str
The second parameter.
Returns
-------
bool
True if successful, False otherwise.
note the "The" and the colons
pydatarecognition/utils.py
Outdated
@@ -270,5 +270,18 @@ def mr_to_hr(number, esd): | |||
in the following format: "343.44(45)" | |||
|
|||
''' | |||
number, esd = np.array(number, dtype='float').astype('str'), np.array(esd, dtype='float').astype('str') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move from np_array to list before doing string operations I think. This works, but it is a bit illogical.
tests/test_utils.py
Outdated
@@ -83,6 +83,6 @@ def test_mr_to_hr(): | |||
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need to test an esd between 0.10 and 0.14 (e.g., 0.13), which behaves differently. Also, we would like to decide what to do (and to test) when the number is not the same number of signficant figures as the esd, e.g. number is 343.1 and the esd is 0.45, or the number is 343.3598 and the esd is 0.56 or 343.1 and esd is 0.045 and so on (try and think of all the possibilities) Please also make sure that we are testing rounding up and rounding down.
tests/test_utils.py
Outdated
@@ -83,6 +83,6 @@ def test_mr_to_hr(): | |||
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13] | |||
actual = mr_to_hr(number, esd) | |||
expected = np.array(["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"], dtype='str') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's pass back lists of strings rather than np array think
@sbillinge: @maak-sdu and I would like to ask if the "proper esd rules" should be used only for the mr_to_hr or hr_to_mr as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good progress, few comments
pydatarecognition/utils.py
Outdated
The array-like object that contains numbers with their estimated standard deviations as strings | ||
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] | ||
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] or | ||
The string that contains numbers with their estimated standard deviations separated by new line characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want a bunch of string parsing in this function? I suggest no. Just take lists of strings?
pydatarecognition/utils.py
Outdated
number_esd : numpy array | ||
The numpy array that contains numbers (e.g. 343.44) with estimated standard deviations (e.g. 0.45) as strings | ||
in the following format: "343.44(45)" | ||
number_esd : list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to specify the name of the variable that is returned, since the name will die with the function on return.
pydatarecognition/utils.py
Outdated
esd : array_like or string | ||
The array-like object that contains estimated standard deviations in the following format: | ||
[0.45, 0.067, 0.0001] | ||
The string that contains estimated standard deviations in the following format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't do string parsing inside this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbillinge: From bullet point 2 regarding mr_to_hr: "we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring."
So you don't want us to accept strings but only discuss in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check that this is fixed (strings are not accepted, no string parsing insded the function, and docstring reflecting all this)
I wouldn't take strings, or just allow a string of the form "234.345(5)"
and turn it into a list inside, one or the other.
…On Mon, Sep 13, 2021 at 11:57 AM berrakozer ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pydatarecognition/utils.py
<#79 (comment)>
:
>
- esd : array_like
- The array-like object that contains estimated standard deviations
+ esd : array_like or string
+ The array-like object that contains estimated standard deviations in the following format:
+ [0.45, 0.067, 0.0001]
+ The string that contains estimated standard deviations in the following format:
@sbillinge <https://github.com/sbillinge>: From bullet point 2 regarding
mr_to_hr: "we are likely returning a list of strings, not a numpy array.
What do you want to do if the function is handed a string rather than a
list? This should be discussed in the docstring."
So you don't want us to accept strings but only discuss in the docstring?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAOWUJMXMIR2E6HPJASQY3UBYNNTANCNFSM5DXDNSEA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Simon Billinge
Professor, Columbia University
Physicist, Brookhaven National Laboratory
|
@sbillinge: @berrakozer and I just tried to ensure that we got the esd concept right before we implement it to the code. Please see the examples below and let us know, whether we get it right or not. Thanks.
|
This is a bit hard to assess because the original esd's are not expressed to enough significant figures for most of them. (i.e., 0.1, 0.01). For the last two where I can do the assessment they should be |
@sbillinge: thanks for clarifying. @berrakozer and I just tried some additional examples. Please let us know, whether we got it right or wrong this time. Thanks.
|
yes, that's it, except the last one should be |
@sbillinge: @berrakozer and I tried to come up with a suggested workflow for the mr_to_mr code. We tried to consider a test case:
In addition, we have a question for a limiting case:
We would go with solution 4. Is that the proper one? Thanks. |
here is someone else's hack at this:
https://stackoverflow.com/questions/41897705/how-would-i-round-a-measurement-and-its-error-estimate-to-a-specified-number-of
I am not sure if it is any good.
Your edge case is an interesting one. Probably solution 4 is the best. It
is always ok to slightly overestimate the uncertainty.
S
…On Fri, Sep 17, 2021 at 8:54 AM Martin Aaskov Karlsen < ***@***.***> wrote:
@sbillinge <https://github.com/sbillinge>: @berrakozer
<https://github.com/berrakozer> and I tried to come up with a suggested
workflow for the mr_to_mr code. We tried to consider a test case:
NUMBER ESD
999235 111
Split number into 'irrelevant' and 'relevant' parts for rounding:
IN RN ESD
999 235 111
Keep 'irrelevant' part of number and round esd properly:
IN RN Rounded ESD
999 235 (110)
Round 'relevant' number according to significant figures for the rounded esd:
IN Rounded RN Rounded ESD
999 240 (110)
Put things together to get the final result:
999240(110)
In addition, we have a question for a limiting case:
NUMBER ESD
9235 145
Solution 1: do nothing
9235(145)
Solution 2: keep the first two significant figures of esd
9240(140)
Solution 3: round the esd once and the number according to the esd
9240(150)
Solution: round the esd twice (145->150->200) and round the number according to the esd:
9200(200)
We would go with solution 4. Is that the proper one? Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAOWUJG33BORTISKO4JPVLUCM3BDANCNFSM5DXDNSEA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Simon Billinge
Professor, Columbia University
Physicist, Brookhaven National Laboratory
|
@sbillinge Ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks beautiful. Some small comments.
pydatarecognition/utils.py
Outdated
---------- | ||
number_esd : array_like | ||
The array-like object that contains numbers with their estimated standard deviations as strings | ||
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's model good behavior here and use examples that follow good practice! :)
tests/test_utils.py
Outdated
esd = [0.5, 0.5, 0.3, 0.2, 0.13, 0.2, 2, 20, 20, 200, 1, 1, 0.2, 0.2, 0.14, 100, 110, 2] | ||
actual = mr_to_hr_number_and_esd(number, esd) | ||
expected = ['123.5(5)', '123.3(5)', '123.4(3)', '123.3(2)', '123.12(13)', '132.1(2)', '19(2)', '120(20)', '150(20)', | ||
'1200(200)', '2(1)', '1(1)', '2.1(2)', '2.1(2)', '2.14(14)', '10(100)', '10(110)', '11(2)'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be 0(100) and 0(100) rather than 10(100) and 10(110)
also maybe good to have examples where the main number rounds up and rounds down. for example, 120(20) and 150(20) are testing the same thing, so make the second one 154 pm 20. Also, I don't see any esd rounding getting tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still needs some tweaking I think. It is ok to test improperly formed number(esd) pairs (people will put these in their papers/cifs for sure) but we should also test all the right ones.
pydatarecognition/utils.py
Outdated
@@ -217,7 +217,7 @@ def get_formatted_crossref_reference(doi): | |||
def hr_to_mr_number_and_esd(number_esd): | |||
''' | |||
splits human readable numbers with estimated standard deviations (e.g. 343.44(45)) into machine readable numbers and | |||
estimated standard deviations (e.g. 343.44 and 0.45). | |||
estimated standard deviations (e.g. 343.44 and 0.45) without any rounding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that kind of fix....just round so it is correct!
splits human readable numbers with estimated standard deviations (e.g. 343.4(5)) into machine readable numbers and
estimated standard deviations (e.g. 343.4 and 0.5).
pydatarecognition/utils.py
Outdated
numpy array | ||
The array with estimated standard deviations as floats | ||
list | ||
The list with estimated standard deviations as floats, e.g. [0.45, 0.067, 0.0001] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get the number of significant figures on the esd's right here
tests/test_utils.py
Outdated
@@ -75,9 +75,8 @@ def mockreturn(*args, **kwargs): | |||
def test_hr_to_mr_number_and_esd(): | |||
number_esd = ["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list of tests is ok, but why are we testing four improper number/esd values and only one proper one? maybe also do a 243(6) and a 3200(300)? sthg like that?
… to round numbers
@sbillinge: test_hr_to_mr_number_and_esd also modified, test passing. |
… to this function and altered test_mr_to_hr_number_and_esd, test passes
added some slightly more challenging tests
simon corrected his bad test
@sbillinge Would we need anything else to be done for this PR to be merged? |
@berrakozer @maak-sdu I added some more tests, but it now seems to be failing CI. I think the tests I added were valid, so there may be a code modification that is needed to make the tests pass. But you should also probably check the tests I added and make sure I didn't make a mistake. Only look at the final commit (it fixes an incorrect test from the previous one) |
btw, you will have to sync your local with your fork to get these changes to your local..... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments
@@ -347,29 +347,30 @@ def hr_to_mr_number_and_esd(number_esd): | |||
|
|||
def mr_to_hr_number_and_esd(number, esd): | |||
''' | |||
merges machine readable (rounded) numbers and (rounded) estimated standard deviations (e.g. 343.4 and 0.5) | |||
rounds and merges machine readable numbers and estimated standard deviations (e.g. 343.4 and 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good! I like it!
tests/test_utils.py
Outdated
esd = [0.5, 0.5, 0.326, 0.2, 0.13, 0.236, 2, 20, 20, 207, 1, 1.4, 0.2, 0.25, 0.14, 100, 100, 2] | ||
actual = mr_to_hr_number_and_esd(number, esd) | ||
expected = ['123.5(5)', '123.3(5)', '123.4(3)', '123.4(2)', '123.12(13)', '132.1(2)', '19(2)', '120(20)', '150(20)', | ||
'1200(200)', '2(1)', '1(1)', '2.1(2)', '2.1(3)', '2.14(14)', '0(100)', '0(100)', '11(2)'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another edge case that we should think about. Right now it is 1 +/- 1.4, which is really the same as 1.0 +/- 1.4.
Should that go to 1(1), 0(1), or even 1.0(14)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbillinge Right now, we stick to the convention that when the esd > number, the number is set to zero and the esd is rounded to one significant figure. Do we want different behavior ( regarding the edge case above: 1.0(14) )?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After conversation with @sbillinge @maak-sdu we decided the desired behavior should be (7.26, 100) -> "7(100)"
and (50, 100) -> 50(100)
.
We think we need a new test for Simon's edge case, something like (1, 1.4) -> 1.0(14)
. How about (1, 1.5) -> 1(2)
.
Please can you make the tests look like this and then make the code pass tests? Thanks so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbillinge: during our meeting today, @berrakozer and I arrived at the following principles. Those are the ones that we will try to follow while editing the behavior of the function.
Whenever the number < esd, the number of significant figures for the number and esd should be one,
whenever the 'esd convention' (see next paragraph) is not in use:
number esd number(esd)
1.1 10 1(10)
1.1 11 1(10)
Edge cases, where we only have one significant figure for the number, but add a 'point zero'
to follow the 'esd convention' used elsewhere in the function, we have two significant figures for the esd:
number esd number(esd)
1 1.4 1.0(14)
1.2 1.4 1.0(14)
The 'esd convention' should be used whenever the order of magnitude for the esd is equal to or smaller than
that of the number, n_esd <= n_number:
number esd number(esd)
1.031 0.0014 1.0310(14)
1.031 0.014 1.031(14)
1.031 0.14 1.03(14)
1.031 1.4 1.0(14)
1.031 14 1(10)
1.031 15 1(20)
Excellent! Can we capture this logic somewhere in the documentation? It
will be helpful for future researchers....
…On Wed, Oct 27, 2021, 11:46 AM Martin Aaskov Karlsen < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/test_utils.py
<#79 (comment)>
:
> esd = [0.5, 0.5, 0.326, 0.2, 0.13, 0.236, 2, 20, 20, 207, 1, 1.4, 0.2, 0.25, 0.14, 100, 100, 2]
actual = mr_to_hr_number_and_esd(number, esd)
- expected = ['123.5(5)', '123.3(5)', '123.4(3)', '123.4(2)', '123.12(13)', '132.1(2)', '19(2)', '120(20)', '150(20)',
- '1200(200)', '2(1)', '1(1)', '2.1(2)', '2.1(3)', '2.14(14)', '0(100)', '0(100)', '11(2)']
@sbillinge <https://github.com/sbillinge>: during our meeting today,
@berrakozer <https://github.com/berrakozer> and I arrived at the
following principles. Those are the ones that we will try to follow while
editing the behavior of the function.
Whenever the number < esd, the number of significant figures for the
number and esd should be one,
whenever the 'esd convention' (see next paragraph) is not in use:
number esd number(esd)
1.1 10 1(10)
1.1 11 1(10)
Edge cases, where we only have one significant figure for the number, but
add a 'point zero'
to follow the 'esd convention' used elsewhere in the function, we have two
significant figures for the esd:
number esd number(esd)
1 1.4 1.0(14)
1.2 1.4 1.0(14)
The 'esd convention' should be used whenever the order of magnitude for
the esd is equal to or smaller than
that of the number, n_esd <= n_number:
number esd number(esd)
1.031 0.0014 1.0310(14)
1.031 0.014 1.031(14)
1.031 0.14 1.03(14)
1.031 1.4 1.0(14)
1.031 14 1(10)
1.031 15 1(20)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAOWUO5N4QWZZ5BZVW7ZVDUJANGHANCNFSM5DXDNSEA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@sbillinge: Should I copy the logic above to a word or txt document to keep record of it and close this PR as you have requested on the Rebuttal list? |
closes #54