Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP:hr_to_mr and mr_to_hr functions (closes #54) #79

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
bdb22fe
docstring for hr_to_mr function
berrakozer Sep 9, 2021
2cbf60c
test for hr_to_mr function
berrakozer Sep 9, 2021
f5b5f87
hr_to_mr function added to utils.py
berrakozer Sep 9, 2021
8ae23f6
docstring for mr_to_hr function
berrakozer Sep 9, 2021
c574b32
test for mr_to_hr function
berrakozer Sep 9, 2021
c90c4f0
mr_to_hr function added to utils.py and testfor mr_to_hr edited, test…
berrakozer Sep 9, 2021
e2d9d8e
renamed functions and introduced numpy conventions to the docstrings
berrakozer Sep 13, 2021
f2600bd
edited the parameters and returns in docstrings of both functions
berrakozer Sep 13, 2021
72b2687
altered hr_to_mr_number_and_esd function and test to accept string in…
berrakozer Sep 13, 2021
4aa333e
edited hr_to_mr function and test
berrakozer Sep 14, 2021
11eb646
edited mr_to_hr function and test
berrakozer Sep 14, 2021
259bbf1
docstring for the (new) function round_number_esd
berrakozer Sep 23, 2021
8b05b71
added test_round_number_esd function
berrakozer Sep 23, 2021
67a0b4a
added round_number_esd function to utils.py, test passes
berrakozer Sep 23, 2021
e294ce3
edited the mr_to_hr_number_and_esd docstring
berrakozer Sep 23, 2021
1411959
edited test_mr_to_hr_number_and_esd
berrakozer Sep 23, 2021
fb1e2e5
edited mr_to_hr_number_and_esd and the test, test passing
berrakozer Sep 23, 2021
f91eef3
Edited round_number_esd function and the test, test passes
berrakozer Sep 24, 2021
b5209ec
Edited test_mr_to_hr_number_and_esd, test passes
berrakozer Sep 24, 2021
ccd1faf
adjusted the cases where value is smaller than value error in utils.…
berrakozer Sep 24, 2021
fc9d266
edited hr_to_mr_number_and_esd function and test
berrakozer Sep 24, 2021
fc72a66
moved round_number_esd upwards and used it in hr_to_mr_number_and_esd…
berrakozer Sep 28, 2021
70e6bd5
edited docstring of hr_to_mr_number_and_esd
berrakozer Sep 28, 2021
c3e4709
edited docstring of mr_to_hr_number_and_esd, included proper rounding…
berrakozer Sep 28, 2021
94a66a2
Update test_utils.py
sbillinge Sep 30, 2021
e4d6f78
Update test_utils.py
sbillinge Sep 30, 2021
52b2288
fixed test_mr_to_hr_number_and_esd, test passes
berrakozer Oct 5, 2021
5bd70fb
Merge branch 'main' into sd_function_issue54
sbillinge Dec 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions pydatarecognition/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,4 +213,81 @@ def get_formatted_crossref_reference(doi):

return ref, ref_date


def hr_to_mr_number_and_esd(number_esd):
'''
splits human readable numbers with estimated standard deviations (e.g. 343.44(45)) into machine readable numbers and
estimated standard deviations (e.g. 343.44 and 0.45).

Parameters
----------
number_esd : array_like or string
The array-like object that contains numbers with their estimated standard deviations as strings
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] or
The string that contains numbers with their estimated standard deviations separated by new line characters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want a bunch of string parsing in this function? I suggest no. Just take lists of strings?

in the following format: "343.44(45)\n324908.435(67)\n0.0783(1)"

Returns
-------
number : numpy array
The array with the numbers as floats

esd : numpy array
The array with estimated standard deviations as floats

'''
number_esd = np.array(number_esd, dtype='str')
number = np.char.split(number_esd, sep="(")
esd = np.array([e[1].split(")")[0] for e in number], dtype='float')
number = np.array([e[0] for e in number], dtype='str')
esd_oom = []
for i in range(len(number)):
if len(number[i].split(".")) == 1:
esd_oom.append(1)
else:
esd_oom.append(10**-len(number[i].split(".")[1]))
esd_oom = np.array(esd_oom, dtype='float')
number, esd = np.array(number, dtype='float'), np.array(esd * esd_oom, dtype='float')

return number, esd


def mr_to_hr_number_and_esd(number, esd):
'''
merges machine readable numbers and estimated standard deviations (e.g. 343.44 and 0.45) into human readable
numbers with estimated standard deviations (e.g. 343.44(45)).

Parameters
----------
number : array_like or string
The array-like object that contains numbers in the following format: [343.44, 324908.435, 0.0783] or
The string that contains numbers in the following format: "343.44\n324908.435\n0.0783"

esd : array_like or string
The array-like object that contains estimated standard deviations in the following format:
[0.45, 0.067, 0.0001]
The string that contains estimated standard deviations in the following format:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do string parsing inside this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge: From bullet point 2 regarding mr_to_hr: "we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring."

So you don't want us to accept strings but only discuss in the docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check that this is fixed (strings are not accepted, no string parsing insded the function, and docstring reflecting all this)

"0.45\n0.067\n0.0001"

Returns
-------
number_esd : list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify the name of the variable that is returned, since the name will die with the function on return.

The list of strings that contains the rounded numbers with estimated standard deviations
in the following format: ["343.4(5)", "324908.44(7)", "0.0783(1)" ]

'''
number, esd = np.array(number, dtype='float').astype('str'), np.array(esd, dtype='float').astype('str')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move from np_array to list before doing string operations I think. This works, but it is a bit illogical.

number_hr, esd_hr = [], []
for i in range(len(number)):
if number[i].split(".")[1] == "0":
number_hr.append(number[i].split(".")[0])
esd_hr.append(esd[i].split(".")[0])
else:
number_hr.append(number[i])
esd_hr.append(int(esd[i].split(".")[1]))
number, esd = np.array(number_hr, dtype='str'), np.array(esd_hr, dtype='str')
number_esd = np.array([f'{number[i]}({esd[i]})' for i in range(len(esd))])

return number_esd

# End of file.
18 changes: 17 additions & 1 deletion tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
import pytest
from datetime import date
from habanero import Crossref
from pydatarecognition.utils import data_sample, pearson_correlate, xy_resample, get_formatted_crossref_reference
from pydatarecognition.utils import data_sample, pearson_correlate, xy_resample, get_formatted_crossref_reference, \
hr_to_mr, mr_to_hr

def test_data_sample():
test_cif_data = [[10.0413, 10.0913, 10.1413, 10.1913],
Expand Down Expand Up @@ -69,4 +70,19 @@ def mockreturn(*args, **kwargs):
actual = get_formatted_crossref_reference("test")
assert actual == expected


def test_hr_to_mr():
number_esd = ["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list of tests is ok, but why are we testing four improper number/esd values and only one proper one? maybe also do a 243(6) and a 3200(300)? sthg like that?

actual = hr_to_mr(number_esd)
expected = np.array([343.44, 324908.435, 0.0783, 11, 51]), np.array([0.45, 0.067, 0.0001, 1, 13])
assert np.allclose(actual[0], expected[0])
assert np.allclose(actual[1], expected[1])


def test_mr_to_hr():
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need to test an esd between 0.10 and 0.14 (e.g., 0.13), which behaves differently. Also, we would like to decide what to do (and to test) when the number is not the same number of signficant figures as the esd, e.g. number is 343.1 and the esd is 0.45, or the number is 343.3598 and the esd is 0.56 or 343.1 and esd is 0.045 and so on (try and think of all the possibilities) Please also make sure that we are testing rounding up and rounding down.

actual = mr_to_hr(number, esd)
expected = np.array(["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"], dtype='str')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's pass back lists of strings rather than np array think

assert np.array_equal(actual, expected)

# End of file.