Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start making unit testing more general #7799

Closed
wants to merge 10 commits into from

Conversation

dstansby
Copy link
Contributor

As part of #525, I would like to start running the same tests already implemented for pint on other unit libraries - the one I'm specifically interested in is astropy.

This PR is a proof of concept to start making the tests in test_units.py more general, so it's easy to implement tests against other unit libraries. Important bits of this PR:

  • The UnitInfo base class, which defines an interface against which additional unit libraries can be implemented.
  • Concrete implementations of this for pint and astropy (the astropy one is commented out for now)
  • Modifications that make test_apply_ufunc_dataarray run for both astropy and pint.
  • A few new tests for helper functions, that I needed to fix issues with astropy testing

Would be greatful for feedback - is this worthwhile? If so, I can roll this approach out to more tests in test_units.py (there's certainly a lot of them!), either in this PR or a subsequent one.

@welcome
Copy link

welcome bot commented Apr 30, 2023

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@TomNicholas
Copy link
Member

Hi @dstansby, thanks for taking initiative on this! Supporting other units-aware packages would be awesome.

Are you aware of our efforts around #6894? The idea there was to create a general framework for downstream testing of duck-array libraries, including any implementations of units.

I think the ideas you are proposing here are useful and important, but we should probably discuss what we want the end state of duck-array test suites to look like.

cc @keewis

@dcherian
Copy link
Contributor

dcherian commented May 1, 2023

In general I think it would be fine to merge incremental changes.

It may be good to schedule a quick 30 minute chat to sync up ideas here.

@dstansby
Copy link
Contributor Author

dstansby commented May 1, 2023

I was not aware of #6894, which is definitely my bad for not searching properley before setting off 😄

It looks like the changes I'm proposing here are probably orthogonal to work in #6894 though? The new tests added in #6894 still use pint as the single unit library and add some new tests with the new hypothesis strategies, but the goal of this PR is to generalise the existing unit testing to make it a bit easier to run tests with different unit libraries. Also definitely agree that keeping the end goal for duck arrays in mind is important, but I think that testing for unit libraries is a bit less general than the duck array testing stuff, because there's a host of extra information you need to be a unit library compared to a general duck array.

Anyway, definitely agree that it would be good to have the end goal in mind here. Not sure if I'll be able to find time for a synchronous discussion, but happy for others to do that and report back, or happy to chat async somewhere that isn't a github issue if that would be helpful.

@TomNicholas
Copy link
Member

I was not aware of #6894, which is definitely my bad for not searching properley before setting off smile

No worries! 😁

It looks like the changes I'm proposing here are probably orthogonal to work in #6894 though?

I think generally yes they are, I agree.

the goal of this PR is to generalise the existing unit testing to make it a bit easier to run tests with different unit libraries

Any work that helps generalise xarray's support of units beyond specifically just pint is going to be useful!

My main point to draw your attention to is the idea that eventually, one-day, it would be nice to move all array-library specific testing out of the xarray core repo in favour of an approach similar to that proposed in #6894.

I think that testing for unit libraries is a bit less general than the duck array testing stuff, because there's a host of extra information you need to be a unit library compared to a general duck array.

This is also true. Maybe that means for example the base class you are writing here has a long-term future as an optional part of xarray's testing framework in #6894, specifically for use when testing units libraries? Just thinking out loud

@dstansby
Copy link
Contributor Author

dstansby commented May 6, 2023

I think this is good for review now? There's plenty of tests lower down the file that can be generalised using the new framework I've introduced, but I think worth leaving that to another PR to make this one easier to review.

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @dstansby. The general idea sounds good to me, but I'd change a few details.

(There might be more, but I didn't want to delay the initial review any further)

xarray/tests/test_units.py Outdated Show resolved Hide resolved
xarray/tests/test_units.py Outdated Show resolved Hide resolved
@@ -42,6 +74,63 @@
]


class PintInfo(UnitInfo):
unit = unit_registry.m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth moving the unit_registry object into the UnitInfo class? Not sure how that would look, though, and would require every test to be converted to the style proposed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think moving unit_registry should be saved for a follow up PR once the basics of this PR have been settled on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we were to create an alias? That way, we could incrementally transition tests to use the UnitInfo registry, and once every test has been migrated make the alias the only instance. Something like

Suggested change
unit = unit_registry.m
ureg = unit_registry
unit = ureg.m

maybe?

Comment on lines 178 to 182
for unit_lib in unit_libs:
if isinstance(array, unit_lib.quantity_type):
return unit_lib.strip_units(array)

return array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to define a function that determines the unit info instance (or have a dict of quantity type → unit info and use that to look up the unit info)? That way we don't have these loops everywhere.

Suggested change
for unit_lib in unit_libs:
if isinstance(array, unit_lib.quantity_type):
return unit_lib.strip_units(array)
return array
unit_lib = units_libs.get(type(array), None)
if unit_lib is not None:
return unit_lib.strip_units(array)
return array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 167 to 174
if isinstance(obj, (xr.Variable, xr.DataArray, xr.Dataset)):
obj = obj.data

try:
return obj.units
except AttributeError:
return None
for unit_lib in unit_libs:
if isinstance(obj, unit_lib.quantity_type):
return unit_lib.get_unit(obj)

return None
Copy link
Collaborator

@keewis keewis May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would probably make sense to follow the implementation in pint-xarray... It's been quite a while since I wrote this (and didn't really look at it ever since), so there are a few subtle bugs here (for example: xr.Dataset has no attribute .data, and if it has that variable it will not give you the underlying array)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@keewis keewis May 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close, but I meant the higher-level functions attach_units, convert_units, extract_units, and strip_units (and also the *_unit_attributes equivalent)

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the ping (and sorry for the delay), I had a few comments pending but forgot to actually post them.

Comment on lines +121 to +125
unit_libs = [PintInfo] # + [AstropyInfo]
known_quantity_types = tuple(lib.quantity_type for lib in unit_libs)
known_unit_types = tuple(lib.unit_type for lib in unit_libs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go a bit further and use is instead of isinstance:

Suggested change
unit_libs = [PintInfo] # + [AstropyInfo]
known_quantity_types = tuple(lib.quantity_type for lib in unit_libs)
known_unit_types = tuple(lib.unit_type for lib in unit_libs)
unit_libs = [PintInfo] # + [AstropyInfo]
known_quantity_types = {lib.quantity_type: lib for lib in unit_libs}
known_unit_types = {lib.unit_type: lib for lib in unit_libs}

I don't think we need to care about subclasses here, as any subclass library should just add its own UnitInfo class.

You'd use that like this:

lib = units_libs.get(type(obj), no_quantity)

where no_quantity is the "no quantity" info object I'm suggesting below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this comment - where are you suggesting to use is instead of isinstance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, referring to is was wrong, what I meant was == and __hash__ which I think is what dict lookups use. The point, however, is that we don't catch subclasses with that, as we would with isinstance.

xarray/tests/test_units.py Outdated Show resolved Hide resolved
@@ -42,6 +74,63 @@
]


class PintInfo(UnitInfo):
unit = unit_registry.m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we were to create an alias? That way, we could incrementally transition tests to use the UnitInfo registry, and once every test has been migrated make the alias the only instance. Something like

Suggested change
unit = unit_registry.m
ureg = unit_registry
unit = ureg.m

maybe?

xarray/tests/test_units.py Outdated Show resolved Hide resolved
Comment on lines +179 to 184
if lib is not None:
return lib.get_unit(obj)
else:
return None
Copy link
Collaborator

@keewis keewis May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we were to define a "no unit library" UnitInfo class that has lib.get_unit always return None? That would allow using dict.get's default value to special-case non-quantities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to implement a get_unit() method on a "no unit library" class, and I don't think it would make sense to sub-class UnitInfo for something that doesn't actually have units.

Copy link
Collaborator

@keewis keewis Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting to use a dummy class that defines its get_unit method to always return None:

class not_a_quantity:
    def get_unit(self, obj):
        return None

unit_libs = {...}  # does not contain `no_quantity`

a = 1
b = ureg.Quantity(1, "m")

def get_unit(obj):
    lib = units_libs.get(type(obj), not_a_quantity)
    return lib.get_unit(obj)

get_unit(b)  # Unit("m")
get_unit(a)  # None

In other words, by making use of the dict.get default we can avoid the additional comparison with None, where None is the default value for dict.get. Not sure how much of a difference that makes, though, and I certainly won't force you to use that trick.

@github-actions github-actions bot added topic-backends topic-indexing topic-CF conventions topic-groupby topic-dask topic-performance topic-zarr Related to zarr storage library topic-arrays related to flexible array support CI Continuous Integration tools topic-rolling topic-combine combine/concat/merge dependencies Pull requests that update a dependency file Automation Github bots, testing workflows, release automation topic-typing io labels Jun 28, 2023
@github-actions github-actions bot removed topic-backends topic-indexing topic-CF conventions topic-groupby topic-dask topic-performance topic-zarr Related to zarr storage library topic-arrays related to flexible array support CI Continuous Integration tools topic-rolling topic-combine combine/concat/merge dependencies Pull requests that update a dependency file Automation Github bots, testing workflows, release automation topic-typing io labels Jun 28, 2023
@dstansby
Copy link
Contributor Author

Closing since I don't have time to work on this any more

@dstansby dstansby closed this Jan 11, 2024
@tien-vo tien-vo mentioned this pull request Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants