Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP:hr_to_mr and mr_to_hr functions (closes #54) #79

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

berrakozer
Copy link
Collaborator

@berrakozer berrakozer commented Sep 9, 2021

closes #54

@berrakozer berrakozer changed the title docstring for hr_to_mr function (closes#54) docstring for hr_to_mr function (closes #54) Sep 9, 2021
@berrakozer berrakozer changed the title docstring for hr_to_mr function (closes #54) WIP:hr_to_mr function (closes #54) Sep 9, 2021
@berrakozer
Copy link
Collaborator Author

@sbillinge: @maak-sdu and I added the hr_to_mr function which passes the corresponding test.

@berrakozer berrakozer changed the title WIP:hr_to_mr function (closes #54) WIP:hr_to_mr and mr_to_hr functions (closes #54) Sep 9, 2021
Copy link
Collaborator

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Few comments.

@@ -270,5 +270,18 @@ def mr_to_hr(number, esd):
in the following format: "343.44(45)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reviewing the code we should get the docstring right.

Here are my issues:

  1. The correct way to do this:
(e.g. 343.44 and 0.45) into human readable
    numbers with estimated standard deviations (e.g. 343.44(45))

would actually be this:

(e.g. 343.44 and 0.45) into human readable
    numbers with estimated standard deviations (e.g. 343.4(5))

unless the esd was between 0.1 and 0.14 in which case it would be

(e.g. 343.44 and 0.14) into human readable
    numbers with estimated standard deviations (e.g. 343.44(14))
  1. we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring.
  2. The function name is actually the name of our entire project, not this little utility function (machine readable literature!). We should use a name that is easily readable and captures what the function does.
  3. For the descriptions we usually follow the numpy standards:
 Parameters
    ----------
    param1 : int
        The first parameter.
    param2 : str
        The second parameter.

    Returns
    -------
    bool
        True if successful, False otherwise.

note the "The" and the colons

@@ -270,5 +270,18 @@ def mr_to_hr(number, esd):
in the following format: "343.44(45)"

'''
number, esd = np.array(number, dtype='float').astype('str'), np.array(esd, dtype='float').astype('str')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move from np_array to list before doing string operations I think. This works, but it is a bit illogical.

@@ -83,6 +83,6 @@ def test_mr_to_hr():
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need to test an esd between 0.10 and 0.14 (e.g., 0.13), which behaves differently. Also, we would like to decide what to do (and to test) when the number is not the same number of signficant figures as the esd, e.g. number is 343.1 and the esd is 0.45, or the number is 343.3598 and the esd is 0.56 or 343.1 and esd is 0.045 and so on (try and think of all the possibilities) Please also make sure that we are testing rounding up and rounding down.

@@ -83,6 +83,6 @@ def test_mr_to_hr():
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13]
actual = mr_to_hr(number, esd)
expected = np.array(["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"], dtype='str')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's pass back lists of strings rather than np array think

@berrakozer
Copy link
Collaborator Author

@sbillinge: @maak-sdu and I would like to ask if the "proper esd rules" should be used only for the mr_to_hr or hr_to_mr as well?

Copy link
Collaborator

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress, few comments

The array-like object that contains numbers with their estimated standard deviations as strings
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"]
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] or
The string that contains numbers with their estimated standard deviations separated by new line characters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want a bunch of string parsing in this function? I suggest no. Just take lists of strings?

number_esd : numpy array
The numpy array that contains numbers (e.g. 343.44) with estimated standard deviations (e.g. 0.45) as strings
in the following format: "343.44(45)"
number_esd : list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify the name of the variable that is returned, since the name will die with the function on return.

esd : array_like or string
The array-like object that contains estimated standard deviations in the following format:
[0.45, 0.067, 0.0001]
The string that contains estimated standard deviations in the following format:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do string parsing inside this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge: From bullet point 2 regarding mr_to_hr: "we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring."

So you don't want us to accept strings but only discuss in the docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check that this is fixed (strings are not accepted, no string parsing insded the function, and docstring reflecting all this)

@sbillinge
Copy link
Collaborator

sbillinge commented Sep 13, 2021 via email

@maak-sdu
Copy link
Collaborator

maak-sdu commented Sep 14, 2021

@sbillinge: @berrakozer and I just tried to ensure that we got the esd concept right before we implement it to the code. Please see the examples below and let us know, whether we get it right or not. Thanks.

NUMBER  ESD     NUMBER(ESD)
1.2     0.1     1.2(1)
1.12    0.01    1.12(1)
1.12    0.11    1.1(1)
1.112   0.001   1.112(1)
1.112   0.011   1.11(1)
1.112   0.111   1.1(1)

@sbillinge
Copy link
Collaborator

@sbillinge: @berrakozer and I just tried to ensure that we got the esd concept right before we implement it to the code. Please see the examples below and let us know, whether we get it right or not. Thanks.

NUMBER  ESD     NUMBER(ESD)
1.2     0.1     1.2(1)
1.12    0.01    1.12(1)
1.12    0.11    1.1(1)
1.112   0.001   1.112(1)
1.112   0.011   1.11(1)
1.112   0.111   1.1(1)

This is a bit hard to assess because the original esd's are not expressed to enough significant figures for most of them. (i.e., 0.1, 0.01). For the last two where I can do the assessment they should be 1.112 0.011 1.112(11) and 1.112 0.111 1.11(11)

@maak-sdu
Copy link
Collaborator

maak-sdu commented Sep 15, 2021

@sbillinge: thanks for clarifying. @berrakozer and I just tried some additional examples. Please let us know, whether we got it right or wrong this time. Thanks.

NUMBER	ESD	NUMBER(ESD)
1.112	0.001	1.112(1)
1.112 	0.011	1.112(11)
1.112 	0.014	1.112(14)
1.112	0.015	1.11(2)
1.115	0.015	1.12(2)
1.112	0.021	1.11(2)
1.112	0.201	1.1(2)
11	1	11(1)
19	11	19(11)
321	123	321(123)

@sbillinge
Copy link
Collaborator

@sbillinge: thanks for clarifying. @berrakozer and I just tried some additional examples. Please let us know, whether we got it right or wrong this time. Thanks.

NUMBER	ESD	NUMBER(ESD)
1.112	0.001	1.112(1)
1.112 	0.011	1.112(11)
1.112 	0.014	1.112(14)
1.112	0.015	1.11(2)
1.115	0.015	1.12(2)
1.112	0.021	1.11(2)
1.112	0.201	1.1(2)
11	1	11(1)
19	11	19(11)
321	123	321(123)

yes, that's it, except the last one should be 320(120)

@maak-sdu
Copy link
Collaborator

@sbillinge: @berrakozer and I tried to come up with a suggested workflow for the mr_to_mr code. We tried to consider a test case:

NUMBER	ESD
999235	111

Split number into 'irrelevant' and 'relevant' parts for rounding:
IN 	RN 	ESD		
999	235 	111

Keep 'irrelevant' part of number and round esd properly:
IN 	RN 	Rounded ESD		
999 	235	(110)

Round 'relevant' number according to significant figures for the rounded esd:
IN	Rounded RN	Rounded ESD
999	240		(110)

Put things together to get the final result:
999240(110)

In addition, we have a question for a limiting case:

NUMBER	ESD
9235	145

Solution 1: do nothing
9235(145)

Solution 2: keep the first two significant figures of esd
9240(140)

Solution 3: round the esd once and the number according to the esd
9240(150)

Solution: round the esd twice (145->150->200) and round the number according to the esd:
9200(200)

We would go with solution 4. Is that the proper one? Thanks.

@sbillinge
Copy link
Collaborator

sbillinge commented Sep 17, 2021 via email

@berrakozer
Copy link
Collaborator Author

@sbillinge Ready for review!

Copy link
Collaborator

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks beautiful. Some small comments.

----------
number_esd : array_like
The array-like object that contains numbers with their estimated standard deviations as strings
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's model good behavior here and use examples that follow good practice! :)

esd = [0.5, 0.5, 0.3, 0.2, 0.13, 0.2, 2, 20, 20, 200, 1, 1, 0.2, 0.2, 0.14, 100, 110, 2]
actual = mr_to_hr_number_and_esd(number, esd)
expected = ['123.5(5)', '123.3(5)', '123.4(3)', '123.3(2)', '123.12(13)', '132.1(2)', '19(2)', '120(20)', '150(20)',
'1200(200)', '2(1)', '1(1)', '2.1(2)', '2.1(2)', '2.14(14)', '10(100)', '10(110)', '11(2)']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be 0(100) and 0(100) rather than 10(100) and 10(110)

also maybe good to have examples where the main number rounds up and rounds down. for example, 120(20) and 150(20) are testing the same thing, so make the second one 154 pm 20. Also, I don't see any esd rounding getting tested.

Copy link
Collaborator

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needs some tweaking I think. It is ok to test improperly formed number(esd) pairs (people will put these in their papers/cifs for sure) but we should also test all the right ones.

@@ -217,7 +217,7 @@ def get_formatted_crossref_reference(doi):
def hr_to_mr_number_and_esd(number_esd):
'''
splits human readable numbers with estimated standard deviations (e.g. 343.44(45)) into machine readable numbers and
estimated standard deviations (e.g. 343.44 and 0.45).
estimated standard deviations (e.g. 343.44 and 0.45) without any rounding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that kind of fix....just round so it is correct!

splits human readable numbers with estimated standard deviations (e.g. 343.4(5)) into machine readable numbers and
    estimated standard deviations (e.g. 343.4 and 0.5).

numpy array
The array with estimated standard deviations as floats
list
The list with estimated standard deviations as floats, e.g. [0.45, 0.067, 0.0001]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the number of significant figures on the esd's right here

@@ -75,9 +75,8 @@ def mockreturn(*args, **kwargs):
def test_hr_to_mr_number_and_esd():
number_esd = ["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list of tests is ok, but why are we testing four improper number/esd values and only one proper one? maybe also do a 243(6) and a 3200(300)? sthg like that?

@berrakozer
Copy link
Collaborator Author

@sbillinge: test_hr_to_mr_number_and_esd also modified, test passing.

berrakozer and others added 4 commits September 28, 2021 16:29
… to this function and altered test_mr_to_hr_number_and_esd, test passes
added some slightly more challenging tests
simon corrected his bad test
@berrakozer
Copy link
Collaborator Author

@sbillinge Would we need anything else to be done for this PR to be merged?

@sbillinge
Copy link
Collaborator

@berrakozer @maak-sdu I added some more tests, but it now seems to be failing CI. I think the tests I added were valid, so there may be a code modification that is needed to make the tests pass. But you should also probably check the tests I added and make sure I didn't make a mistake. Only look at the final commit (it fixes an incorrect test from the previous one)

@sbillinge
Copy link
Collaborator

btw, you will have to sync your local with your fork to get these changes to your local..... git pull origin sd_function_issue54

Copy link
Collaborator

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments

@@ -347,29 +347,30 @@ def hr_to_mr_number_and_esd(number_esd):

def mr_to_hr_number_and_esd(number, esd):
'''
merges machine readable (rounded) numbers and (rounded) estimated standard deviations (e.g. 343.4 and 0.5)
rounds and merges machine readable numbers and estimated standard deviations (e.g. 343.4 and 0.5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good! I like it!

esd = [0.5, 0.5, 0.326, 0.2, 0.13, 0.236, 2, 20, 20, 207, 1, 1.4, 0.2, 0.25, 0.14, 100, 100, 2]
actual = mr_to_hr_number_and_esd(number, esd)
expected = ['123.5(5)', '123.3(5)', '123.4(3)', '123.4(2)', '123.12(13)', '132.1(2)', '19(2)', '120(20)', '150(20)',
'1200(200)', '2(1)', '1(1)', '2.1(2)', '2.1(3)', '2.14(14)', '0(100)', '0(100)', '11(2)']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another edge case that we should think about. Right now it is 1 +/- 1.4, which is really the same as 1.0 +/- 1.4.

Should that go to 1(1), 0(1), or even 1.0(14)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge Right now, we stick to the convention that when the esd > number, the number is set to zero and the esd is rounded to one significant figure. Do we want different behavior ( regarding the edge case above: 1.0(14) )?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After conversation with @sbillinge @maak-sdu we decided the desired behavior should be (7.26, 100) -> "7(100)" and (50, 100) -> 50(100).

We think we need a new test for Simon's edge case, something like (1, 1.4) -> 1.0(14) . How about (1, 1.5) -> 1(2).

Please can you make the tests look like this and then make the code pass tests? Thanks so much.

Copy link
Collaborator

@maak-sdu maak-sdu Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge: during our meeting today, @berrakozer and I arrived at the following principles. Those are the ones that we will try to follow while editing the behavior of the function.

Whenever the number < esd, the number of significant figures for the number and esd should be one,
whenever the 'esd convention' (see next paragraph) is not in use:

number	esd	number(esd)
1.1	10	1(10)
1.1	11	1(10)

Edge cases, where we only have one significant figure for the number, but add a 'point zero'
to follow the 'esd convention' used elsewhere in the function, we have two significant figures for the esd:

number	esd	number(esd)
1	1.4	1.0(14)
1.2	1.4	1.0(14)

The 'esd convention' should be used whenever the order of magnitude for the esd is equal to or smaller than
that of the number, n_esd <= n_number:

number	esd	number(esd)
1.031	0.0014	1.0310(14)
1.031	0.014	1.031(14)
1.031	0.14	1.03(14)
1.031	1.4	1.0(14)
1.031	14	1(10)
1.031	15	1(20)

@sbillinge
Copy link
Collaborator

sbillinge commented Oct 27, 2021 via email

@berrakozer
Copy link
Collaborator Author

@sbillinge: Should I copy the logic above to a word or txt document to keep record of it and close this PR as you have requested on the Rebuttal list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parse_with_sd() function
3 participants