-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSU Units #51
PSU Units #51
Conversation
def test_units_with_psu(): | ||
da = xr.DataArray(10, attrs={"units": "psu"}) | ||
new_da = da.pint.quantify("psu") | ||
new_da = new_da.pint.to("0.001").pint.dequantify() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pint.to
does not take any scalar values, needs to be a valid unit name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, therefore, you need to patch the to
functionality as I had already proposed.
new_da = da.pint.quantify("psu") | ||
new_da = new_da.pint.to("0.001").pint.dequantify() | ||
assert new_da.data == np.array(10) | ||
assert new_da.attrs.get("units") == "0.001" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the intension is to rename units without changing the magnitude, then da.attrs['units'] = "0.001" is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the point is to demonstrate that this is what your code automatically tries to do in units.py
, and leads to a crash. So, we need to have a way to handle this case. You could abstract this test into one that uses the actual unit conversion function, and then handles the conversion correctly.
src/pymorize/units.py
Outdated
@xr.register_dataarray_accessor("pint") | ||
class PSUAwarePintDataArrayAccessor(pint_xarray.accessors.PintDataArrayAccessor): | ||
def to(self, unit): | ||
if unit == "0.001": | ||
logger.critical(">>>> PG PART!!!!") | ||
return super().to("ppt") | ||
return super().to(unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siligam: this is what I had in mind, but you would still need to fix it. Unfortunately, ppt
(should be parts per thousand) is already defined as picopint
.
I don't think we should override a built-in pint
unit at this point; rather just handle the conversion logic manually. This conversion override is where you should probably include the corrected code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For @mandresm: this is what this looks like in practice:
>>> import xarray as xr
>>> da = xr.DataArray([10.0], attrs={"units": "psu"}) # Comes out of FESOM
>>> # The above is equivalent to:
>>> ds = xr.open_dataset("/my/fesom/file.nc") # Attribute sst is the variable name
>>> da = ds.sst
>>> da.pint.quantify() # Turns it into a "unit-aware" array
>>> target_unit = "0.001"
>>> new_da = da.pint.to(target_unit)
if has_scalar(to_unit): | ||
logger.debug( | ||
f"scalar value in to_unit {to_unit} detected. This is not supported. Performing workaround." | ||
) | ||
new_da = new_da.pint.dequantify() | ||
else: | ||
new_da = new_da.pint.to(to_unit).pint.dequantify() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, absolutely no conversion happens. So, you just re-set the unit attribute string. What then happens if you have a 0-1 range variable that needs to be converted to 0-100? Again, no conversion, and you would just (at this point incorrectly) change the unit string without changing the values.
def test_units_with_psu(): | ||
da = xr.DataArray(10, attrs={"units": "psu"}) | ||
new_da = da.pint.quantify("psu") | ||
new_da = new_da.pint.to("0.001").pint.dequantify() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, therefore, you need to patch the to
functionality as I had already proposed.
class mock_rule_unit: | ||
def __init__(self, unit, model_unit=None): | ||
setattr(self, "unit", unit) | ||
setattr(self, "model_unit", model_unit) | ||
|
||
def __getattr__(self, name): | ||
if name not in ("unit", "model_unit"): | ||
return self | ||
return getattr(self, name) | ||
|
||
get = __getattr__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this? Why do you need it? You can just make an empty Rule
object directly if you need to:
class mock_rule_unit: | |
def __init__(self, unit, model_unit=None): | |
setattr(self, "unit", unit) | |
setattr(self, "model_unit", model_unit) | |
def __getattr__(self, name): | |
if name not in ("unit", "model_unit"): | |
return self | |
return getattr(self, name) | |
get = __getattr__ | |
import pymorize.rule | |
rule = pymorize.rule.Rule(cmor_variable="sst", model_unit="psu", unit="0.001") |
def is_unitless(units: str): | ||
""" | ||
Check if units is a unitless value. | ||
|
||
Parameters | ||
---------- | ||
units : str | ||
The string representing units. | ||
|
||
Returns | ||
------- | ||
bool | ||
``True`` if unitless, ``False`` otherwise. | ||
""" | ||
try: | ||
float(units) | ||
except ValueError: | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used. I guess not yet.
def has_scalar(units: str, scalar=re.compile(r"^\s*-?\d.*").search): | ||
""" | ||
Check if units has a scalar value. (e.g. "10kg") | ||
|
||
Parameters | ||
---------- | ||
units : str | ||
The string representing units | ||
|
||
Returns | ||
------- | ||
bool | ||
``True`` if the units has a scalar value, ``False`` otherwise. | ||
""" | ||
if units is None: | ||
return False | ||
if scalar(units): | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very confusing for me (not because of the code, because of the design). "has scalar" is my opinion a bad name. You directly define a regex, and use a callable method of it in the function signature (why would you put it in the function signature?). That effectively means that anytime you call this function, you are creating the regex object in memory separately each time. For HPC this is not really a problem, we have more than enough memory, but this is just bad practice. Here's a better way to do this:
def has_scalar(units: str, scalar=re.compile(r"^\s*-?\d.*").search): | |
""" | |
Check if units has a scalar value. (e.g. "10kg") | |
Parameters | |
---------- | |
units : str | |
The string representing units | |
Returns | |
------- | |
bool | |
``True`` if the units has a scalar value, ``False`` otherwise. | |
""" | |
if units is None: | |
return False | |
if scalar(units): | |
return True | |
return False | |
NUMBER_PREFIX_PATTERN = re.compile(r"^\s*-?\d.*") | |
""" re.Pattern : Precompiled regex pattern to detect if a string starts with a number""" | |
def starts_with_number(text: str) -> bool: | |
""" | |
Check if the input string starts with a numeric value (e.g., "10kg", "-5"). | |
Parameters | |
---------- | |
text : str | |
The input string to check. | |
Returns | |
------- | |
bool | |
True if the string starts with a numeric value, False otherwise. | |
""" | |
try: | |
return bool(NUMBER_PREFIX_PATTERN.match(text)) | |
except TypeError: | |
return False |
src/pymorize/units.py
Outdated
@xr.register_dataarray_accessor("pint") | ||
class PSUAwarePintDataArrayAccessor(pint_xarray.accessors.PintDataArrayAccessor): | ||
def to(self, unit): | ||
if unit == "0.001": | ||
logger.critical(">>>> PG PART!!!!") | ||
return super().to("ppt") | ||
return super().to(unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For @mandresm: this is what this looks like in practice:
>>> import xarray as xr
>>> da = xr.DataArray([10.0], attrs={"units": "psu"}) # Comes out of FESOM
>>> # The above is equivalent to:
>>> ds = xr.open_dataset("/my/fesom/file.nc") # Attribute sst is the variable name
>>> da = ds.sst
>>> da.pint.quantify() # Turns it into a "unit-aware" array
>>> target_unit = "0.001"
>>> new_da = da.pint.to(target_unit)
Superseded by #55. |
Adds the ability to convert PSU units in Pint.
Will close #40