-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start making unit testing more general #7799
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
Hi @dstansby, thanks for taking initiative on this! Supporting other units-aware packages would be awesome. Are you aware of our efforts around #6894? The idea there was to create a general framework for downstream testing of duck-array libraries, including any implementations of units. I think the ideas you are proposing here are useful and important, but we should probably discuss what we want the end state of duck-array test suites to look like. cc @keewis |
In general I think it would be fine to merge incremental changes. It may be good to schedule a quick 30 minute chat to sync up ideas here. |
I was not aware of #6894, which is definitely my bad for not searching properley before setting off 😄 It looks like the changes I'm proposing here are probably orthogonal to work in #6894 though? The new tests added in #6894 still use Anyway, definitely agree that it would be good to have the end goal in mind here. Not sure if I'll be able to find time for a synchronous discussion, but happy for others to do that and report back, or happy to chat async somewhere that isn't a github issue if that would be helpful. |
No worries! 😁
I think generally yes they are, I agree.
Any work that helps generalise xarray's support of units beyond specifically just pint is going to be useful! My main point to draw your attention to is the idea that eventually, one-day, it would be nice to move all array-library specific testing out of the xarray core repo in favour of an approach similar to that proposed in #6894.
This is also true. Maybe that means for example the base class you are writing here has a long-term future as an optional part of xarray's testing framework in #6894, specifically for use when testing units libraries? Just thinking out loud |
I think this is good for review now? There's plenty of tests lower down the file that can be generalised using the new framework I've introduced, but I think worth leaving that to another PR to make this one easier to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @dstansby. The general idea sounds good to me, but I'd change a few details.
(There might be more, but I didn't want to delay the initial review any further)
xarray/tests/test_units.py
Outdated
@@ -42,6 +74,63 @@ | |||
] | |||
|
|||
|
|||
class PintInfo(UnitInfo): | |||
unit = unit_registry.m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth moving the unit_registry
object into the UnitInfo
class? Not sure how that would look, though, and would require every test to be converted to the style proposed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think moving unit_registry
should be saved for a follow up PR once the basics of this PR have been settled on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we were to create an alias? That way, we could incrementally transition tests to use the UnitInfo
registry, and once every test has been migrated make the alias the only instance. Something like
unit = unit_registry.m | |
ureg = unit_registry | |
unit = ureg.m |
maybe?
xarray/tests/test_units.py
Outdated
for unit_lib in unit_libs: | ||
if isinstance(array, unit_lib.quantity_type): | ||
return unit_lib.strip_units(array) | ||
|
||
return array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to define a function that determines the unit info instance (or have a dict of quantity type → unit info and use that to look up the unit info)? That way we don't have these loops everywhere.
for unit_lib in unit_libs: | |
if isinstance(array, unit_lib.quantity_type): | |
return unit_lib.strip_units(array) | |
return array | |
unit_lib = units_libs.get(type(array), None) | |
if unit_lib is not None: | |
return unit_lib.strip_units(array) | |
return array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
xarray/tests/test_units.py
Outdated
if isinstance(obj, (xr.Variable, xr.DataArray, xr.Dataset)): | ||
obj = obj.data | ||
|
||
try: | ||
return obj.units | ||
except AttributeError: | ||
return None | ||
for unit_lib in unit_libs: | ||
if isinstance(obj, unit_lib.quantity_type): | ||
return unit_lib.get_unit(obj) | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would probably make sense to follow the implementation in pint-xarray
... It's been quite a while since I wrote this (and didn't really look at it ever since), so there are a few subtle bugs here (for example: xr.Dataset
has no attribute .data
, and if it has that variable it will not give you the underlying array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close, but I meant the higher-level functions attach_units
, convert_units
, extract_units
, and strip_units
(and also the *_unit_attributes
equivalent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the ping (and sorry for the delay), I had a few comments pending but forgot to actually post them.
unit_libs = [PintInfo] # + [AstropyInfo] | ||
known_quantity_types = tuple(lib.quantity_type for lib in unit_libs) | ||
known_unit_types = tuple(lib.unit_type for lib in unit_libs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably go a bit further and use is
instead of isinstance
:
unit_libs = [PintInfo] # + [AstropyInfo] | |
known_quantity_types = tuple(lib.quantity_type for lib in unit_libs) | |
known_unit_types = tuple(lib.unit_type for lib in unit_libs) | |
unit_libs = [PintInfo] # + [AstropyInfo] | |
known_quantity_types = {lib.quantity_type: lib for lib in unit_libs} | |
known_unit_types = {lib.unit_type: lib for lib in unit_libs} |
I don't think we need to care about subclasses here, as any subclass library should just add its own UnitInfo
class.
You'd use that like this:
lib = units_libs.get(type(obj), no_quantity)
where no_quantity
is the "no quantity" info object I'm suggesting below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this comment - where are you suggesting to use is
instead of isinstance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, referring to is
was wrong, what I meant was ==
and __hash__
which I think is what dict
lookups use. The point, however, is that we don't catch subclasses with that, as we would with isinstance
.
xarray/tests/test_units.py
Outdated
@@ -42,6 +74,63 @@ | |||
] | |||
|
|||
|
|||
class PintInfo(UnitInfo): | |||
unit = unit_registry.m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we were to create an alias? That way, we could incrementally transition tests to use the UnitInfo
registry, and once every test has been migrated make the alias the only instance. Something like
unit = unit_registry.m | |
ureg = unit_registry | |
unit = ureg.m |
maybe?
if lib is not None: | ||
return lib.get_unit(obj) | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we were to define a "no unit library" UnitInfo
class that has lib.get_unit
always return None
? That would allow using dict.get
's default value to special-case non-quantities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to implement a get_unit()
method on a "no unit library" class, and I don't think it would make sense to sub-class UnitInfo
for something that doesn't actually have units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting to use a dummy class that defines its get_unit
method to always return None
:
class not_a_quantity:
def get_unit(self, obj):
return None
unit_libs = {...} # does not contain `no_quantity`
a = 1
b = ureg.Quantity(1, "m")
def get_unit(obj):
lib = units_libs.get(type(obj), not_a_quantity)
return lib.get_unit(obj)
get_unit(b) # Unit("m")
get_unit(a) # None
In other words, by making use of the dict.get
default we can avoid the additional comparison with None
, where None
is the default value for dict.get
. Not sure how much of a difference that makes, though, and I certainly won't force you to use that trick.
9a06b53
to
038b3db
Compare
Closing since I don't have time to work on this any more |
As part of #525, I would like to start running the same tests already implemented for
pint
on other unit libraries - the one I'm specifically interested in isastropy
.This PR is a proof of concept to start making the tests in
test_units.py
more general, so it's easy to implement tests against other unit libraries. Important bits of this PR:UnitInfo
base class, which defines an interface against which additional unit libraries can be implemented.pint
andastropy
(the astropy one is commented out for now)test_apply_ufunc_dataarray
run for bothastropy
andpint
.astropy
testingWould be greatful for feedback - is this worthwhile? If so, I can roll this approach out to more tests in
test_units.py
(there's certainly a lot of them!), either in this PR or a subsequent one.