Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSU Units #51

Closed
wants to merge 4 commits into from
Closed

PSU Units #51

wants to merge 4 commits into from

Conversation

pgierz
Copy link
Member

@pgierz pgierz commented Oct 28, 2024

Adds the ability to convert PSU units in Pint.

Will close #40

def test_units_with_psu():
da = xr.DataArray(10, attrs={"units": "psu"})
new_da = da.pint.quantify("psu")
new_da = new_da.pint.to("0.001").pint.dequantify()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pint.to does not take any scalar values, needs to be a valid unit name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, therefore, you need to patch the to functionality as I had already proposed.

new_da = da.pint.quantify("psu")
new_da = new_da.pint.to("0.001").pint.dequantify()
assert new_da.data == np.array(10)
assert new_da.attrs.get("units") == "0.001"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the intension is to rename units without changing the magnitude, then da.attrs['units'] = "0.001" is the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the point is to demonstrate that this is what your code automatically tries to do in units.py, and leads to a crash. So, we need to have a way to handle this case. You could abstract this test into one that uses the actual unit conversion function, and then handles the conversion correctly.

Comment on lines 46 to 52
@xr.register_dataarray_accessor("pint")
class PSUAwarePintDataArrayAccessor(pint_xarray.accessors.PintDataArrayAccessor):
def to(self, unit):
if unit == "0.001":
logger.critical(">>>> PG PART!!!!")
return super().to("ppt")
return super().to(unit)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siligam: this is what I had in mind, but you would still need to fix it. Unfortunately, ppt (should be parts per thousand) is already defined as picopint.

I don't think we should override a built-in pint unit at this point; rather just handle the conversion logic manually. This conversion override is where you should probably include the corrected code.

Copy link
Member Author

@pgierz pgierz Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For @mandresm: this is what this looks like in practice:

>>> import xarray as xr
>>> da = xr.DataArray([10.0], attrs={"units": "psu"})  # Comes out of FESOM
>>> # The above is equivalent to:
>>> ds = xr.open_dataset("/my/fesom/file.nc")  # Attribute sst is the variable name
>>> da = ds.sst
>>> da.pint.quantify() # Turns it into a "unit-aware" array
>>> target_unit = "0.001"
>>> new_da = da.pint.to(target_unit)

Comment on lines +163 to +169
if has_scalar(to_unit):
logger.debug(
f"scalar value in to_unit {to_unit} detected. This is not supported. Performing workaround."
)
new_da = new_da.pint.dequantify()
else:
new_da = new_da.pint.to(to_unit).pint.dequantify()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, absolutely no conversion happens. So, you just re-set the unit attribute string. What then happens if you have a 0-1 range variable that needs to be converted to 0-100? Again, no conversion, and you would just (at this point incorrectly) change the unit string without changing the values.

def test_units_with_psu():
da = xr.DataArray(10, attrs={"units": "psu"})
new_da = da.pint.quantify("psu")
new_da = new_da.pint.to("0.001").pint.dequantify()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, therefore, you need to patch the to functionality as I had already proposed.

Comment on lines +171 to +181
class mock_rule_unit:
def __init__(self, unit, model_unit=None):
setattr(self, "unit", unit)
setattr(self, "model_unit", model_unit)

def __getattr__(self, name):
if name not in ("unit", "model_unit"):
return self
return getattr(self, name)

get = __getattr__
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Why do you need it? You can just make an empty Rule object directly if you need to:

Suggested change
class mock_rule_unit:
def __init__(self, unit, model_unit=None):
setattr(self, "unit", unit)
setattr(self, "model_unit", model_unit)
def __getattr__(self, name):
if name not in ("unit", "model_unit"):
return self
return getattr(self, name)
get = __getattr__
import pymorize.rule
rule = pymorize.rule.Rule(cmor_variable="sst", model_unit="psu", unit="0.001")

Comment on lines +37 to +55
def is_unitless(units: str):
"""
Check if units is a unitless value.

Parameters
----------
units : str
The string representing units.

Returns
-------
bool
``True`` if unitless, ``False`` otherwise.
"""
try:
float(units)
except ValueError:
return False
return True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used. I guess not yet.

Comment on lines +58 to +76
def has_scalar(units: str, scalar=re.compile(r"^\s*-?\d.*").search):
"""
Check if units has a scalar value. (e.g. "10kg")

Parameters
----------
units : str
The string representing units

Returns
-------
bool
``True`` if the units has a scalar value, ``False`` otherwise.
"""
if units is None:
return False
if scalar(units):
return True
return False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing for me (not because of the code, because of the design). "has scalar" is my opinion a bad name. You directly define a regex, and use a callable method of it in the function signature (why would you put it in the function signature?). That effectively means that anytime you call this function, you are creating the regex object in memory separately each time. For HPC this is not really a problem, we have more than enough memory, but this is just bad practice. Here's a better way to do this:

Suggested change
def has_scalar(units: str, scalar=re.compile(r"^\s*-?\d.*").search):
"""
Check if units has a scalar value. (e.g. "10kg")
Parameters
----------
units : str
The string representing units
Returns
-------
bool
``True`` if the units has a scalar value, ``False`` otherwise.
"""
if units is None:
return False
if scalar(units):
return True
return False
NUMBER_PREFIX_PATTERN = re.compile(r"^\s*-?\d.*")
""" re.Pattern : Precompiled regex pattern to detect if a string starts with a number"""
def starts_with_number(text: str) -> bool:
"""
Check if the input string starts with a numeric value (e.g., "10kg", "-5").
Parameters
----------
text : str
The input string to check.
Returns
-------
bool
True if the string starts with a numeric value, False otherwise.
"""
try:
return bool(NUMBER_PREFIX_PATTERN.match(text))
except TypeError:
return False

Comment on lines 46 to 52
@xr.register_dataarray_accessor("pint")
class PSUAwarePintDataArrayAccessor(pint_xarray.accessors.PintDataArrayAccessor):
def to(self, unit):
if unit == "0.001":
logger.critical(">>>> PG PART!!!!")
return super().to("ppt")
return super().to(unit)
Copy link
Member Author

@pgierz pgierz Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For @mandresm: this is what this looks like in practice:

>>> import xarray as xr
>>> da = xr.DataArray([10.0], attrs={"units": "psu"})  # Comes out of FESOM
>>> # The above is equivalent to:
>>> ds = xr.open_dataset("/my/fesom/file.nc")  # Attribute sst is the variable name
>>> da = ds.sst
>>> da.pint.quantify() # Turns it into a "unit-aware" array
>>> target_unit = "0.001"
>>> new_da = da.pint.to(target_unit)

@pgierz
Copy link
Member Author

pgierz commented Nov 12, 2024

Superseded by #55.

@pgierz pgierz closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PSU Conversion
2 participants