-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP:hr_to_mr and mr_to_hr functions (closes #54) #79
base: main
Are you sure you want to change the base?
Changes from 8 commits
bdb22fe
2cbf60c
f5b5f87
8ae23f6
c574b32
c90c4f0
e2d9d8e
f2600bd
72b2687
4aa333e
11eb646
259bbf1
8b05b71
67a0b4a
e294ce3
1411959
fb1e2e5
f91eef3
b5209ec
ccd1faf
fc9d266
fc72a66
70e6bd5
c3e4709
94a66a2
e4d6f78
52b2288
5bd70fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -213,4 +213,81 @@ def get_formatted_crossref_reference(doi): | |
|
||
return ref, ref_date | ||
|
||
|
||
def hr_to_mr_number_and_esd(number_esd): | ||
''' | ||
splits human readable numbers with estimated standard deviations (e.g. 343.44(45)) into machine readable numbers and | ||
estimated standard deviations (e.g. 343.44 and 0.45). | ||
|
||
Parameters | ||
---------- | ||
number_esd : array_like or string | ||
The array-like object that contains numbers with their estimated standard deviations as strings | ||
in the following format: ["343.44(45)", "324908.435(67)", "0.0783(1)"] or | ||
The string that contains numbers with their estimated standard deviations separated by new line characters | ||
in the following format: "343.44(45)\n324908.435(67)\n0.0783(1)" | ||
|
||
Returns | ||
------- | ||
number : numpy array | ||
The array with the numbers as floats | ||
|
||
esd : numpy array | ||
The array with estimated standard deviations as floats | ||
|
||
''' | ||
number_esd = np.array(number_esd, dtype='str') | ||
number = np.char.split(number_esd, sep="(") | ||
esd = np.array([e[1].split(")")[0] for e in number], dtype='float') | ||
number = np.array([e[0] for e in number], dtype='str') | ||
esd_oom = [] | ||
for i in range(len(number)): | ||
if len(number[i].split(".")) == 1: | ||
esd_oom.append(1) | ||
else: | ||
esd_oom.append(10**-len(number[i].split(".")[1])) | ||
esd_oom = np.array(esd_oom, dtype='float') | ||
number, esd = np.array(number, dtype='float'), np.array(esd * esd_oom, dtype='float') | ||
|
||
return number, esd | ||
|
||
|
||
def mr_to_hr_number_and_esd(number, esd): | ||
''' | ||
merges machine readable numbers and estimated standard deviations (e.g. 343.44 and 0.45) into human readable | ||
numbers with estimated standard deviations (e.g. 343.44(45)). | ||
|
||
Parameters | ||
---------- | ||
number : array_like or string | ||
The array-like object that contains numbers in the following format: [343.44, 324908.435, 0.0783] or | ||
The string that contains numbers in the following format: "343.44\n324908.435\n0.0783" | ||
|
||
esd : array_like or string | ||
The array-like object that contains estimated standard deviations in the following format: | ||
[0.45, 0.067, 0.0001] | ||
The string that contains estimated standard deviations in the following format: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't do string parsing inside this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sbillinge: From bullet point 2 regarding mr_to_hr: "we are likely returning a list of strings, not a numpy array. What do you want to do if the function is handed a string rather than a list? This should be discussed in the docstring." So you don't want us to accept strings but only discuss in the docstring? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please check that this is fixed (strings are not accepted, no string parsing insded the function, and docstring reflecting all this) |
||
"0.45\n0.067\n0.0001" | ||
|
||
Returns | ||
------- | ||
number_esd : list | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you don't need to specify the name of the variable that is returned, since the name will die with the function on return. |
||
The list of strings that contains the rounded numbers with estimated standard deviations | ||
in the following format: ["343.4(5)", "324908.44(7)", "0.0783(1)" ] | ||
|
||
''' | ||
number, esd = np.array(number, dtype='float').astype('str'), np.array(esd, dtype='float').astype('str') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would move from np_array to list before doing string operations I think. This works, but it is a bit illogical. |
||
number_hr, esd_hr = [], [] | ||
for i in range(len(number)): | ||
if number[i].split(".")[1] == "0": | ||
number_hr.append(number[i].split(".")[0]) | ||
esd_hr.append(esd[i].split(".")[0]) | ||
else: | ||
number_hr.append(number[i]) | ||
esd_hr.append(int(esd[i].split(".")[1])) | ||
number, esd = np.array(number_hr, dtype='str'), np.array(esd_hr, dtype='str') | ||
number_esd = np.array([f'{number[i]}({esd[i]})' for i in range(len(esd))]) | ||
|
||
return number_esd | ||
|
||
# End of file. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,8 @@ | |
import pytest | ||
from datetime import date | ||
from habanero import Crossref | ||
from pydatarecognition.utils import data_sample, pearson_correlate, xy_resample, get_formatted_crossref_reference | ||
from pydatarecognition.utils import data_sample, pearson_correlate, xy_resample, get_formatted_crossref_reference, \ | ||
hr_to_mr, mr_to_hr | ||
|
||
def test_data_sample(): | ||
test_cif_data = [[10.0413, 10.0913, 10.1413, 10.1913], | ||
|
@@ -69,4 +70,19 @@ def mockreturn(*args, **kwargs): | |
actual = get_formatted_crossref_reference("test") | ||
assert actual == expected | ||
|
||
|
||
def test_hr_to_mr(): | ||
number_esd = ["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this list of tests is ok, but why are we testing four improper number/esd values and only one proper one? maybe also do a 243(6) and a 3200(300)? sthg like that? |
||
actual = hr_to_mr(number_esd) | ||
expected = np.array([343.44, 324908.435, 0.0783, 11, 51]), np.array([0.45, 0.067, 0.0001, 1, 13]) | ||
assert np.allclose(actual[0], expected[0]) | ||
assert np.allclose(actual[1], expected[1]) | ||
|
||
|
||
def test_mr_to_hr(): | ||
number, esd = [343.44, 324908.435, 0.0783, 11, 51], [0.45, 0.067, 0.0001, 1, 13] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we will need to test an esd between 0.10 and 0.14 (e.g., 0.13), which behaves differently. Also, we would like to decide what to do (and to test) when the number is not the same number of signficant figures as the esd, e.g. number is 343.1 and the esd is 0.45, or the number is 343.3598 and the esd is 0.56 or 343.1 and esd is 0.045 and so on (try and think of all the possibilities) Please also make sure that we are testing rounding up and rounding down. |
||
actual = mr_to_hr(number, esd) | ||
expected = np.array(["343.44(45)", "324908.435(67)", "0.0783(1)", "11(1)", "51(13)"], dtype='str') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's pass back lists of strings rather than np array think |
||
assert np.array_equal(actual, expected) | ||
|
||
# End of file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want a bunch of string parsing in this function? I suggest no. Just take lists of strings?