Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query module: reduce equivalent encapsulated conditions #430

Open
sadielbartholomew opened this issue Aug 3, 2022 · 1 comment
Open
Labels
dask Relating to the use of Dask enhancement New feature or request question General question

Comments

@sadielbartholomew
Copy link
Member

Whilst investigating #427 I noticed that trivial equal and/or equivalent conditions in a complex condition are listed and considered separately. E.g, using a clean as-is lama-to-dask branch (hence including this in Dask/post-Dask scope of work), I see:

>>> import cf
>>> # Generating a trivially-redundant query condition:
>>> q = cf.lt(9)
>>> c = q & q & q
>>> c
<CF Query: [[(lt 9) & (lt 9)] & (lt 9)]>
>>> # Same goes with different objects but equal queries:
>>> p = cf.lt(9)
>>> q.equals(p)
True
>>> c = q & p
>>> c
<CF Query: [(lt 9) & (lt 9)]>
>>> # Now with unit considerations to generate some equivalency:
>>> r = cf.lt(9000)
>>> q.set_condition_units('km')
>>> r.set_condition_units('m')
>>> q
<CF Query: (lt 9.0 km)>
>>> r
<CF Query: (lt 9000 m)>
>>> q.equals(r)
False
>>> q.equivalent(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/query.py", line 822, in equivalent
    _DEPRECATION_ERROR_FUNCTION(self, "equivalent")
  File "/home/sadie/cf-python/cf/functions.py", line 3205, in _DEPRECATION_ERROR_FUNCTION
    raise DeprecationError(
cf.functions.DeprecationError: Function <CF Query: (lt 9.0 km)> has been deprecated at version 3.0.0 and is no longer available and will be removed at version 4.0.0. equivalent
>>> # Not sure why equivalent has been deprecated here? What can I use instead?
>>> s = q & r
>>> s
<CF Query: [(lt 9.0 km) & (lt 9000 m)]>

(the same applies on the master branch, essentially, but set_condition_units is a method new to the lama-to-dask branch so the snippet won't work in that case and in that sense). @davidhassell note also the question in the final comment!

Whilst it is unlikely that a user would directly end up having equivalent conditions inside their query, perhaps through accumulation of a combination of complex conditions it might occur and not be obvious, so may not be too uncommon - regardless, I think we should, ideally, simplify such conditions as they are built up, e.g. taking the above example c = q & q & q would become c = q via c = (q & q) & q -> c = q & q, with more complex expressions being reduced piece-by-piece in the same way.

Note I am not suggesting we pre-evaluate the queries themselves, only checking against the logical equivalency of the conditions encapsulated, as the query is built.

Is this a good idea, conceptually? Also I imagine it is possible without much if any boilerplate code, at least, but if it does require a lot then, even if conceptually desired, maybe it should not be applied.

(This is not an urgent or ground-breaking enhancement, but something that can be improved somewhat.)

@sadielbartholomew sadielbartholomew added enhancement New feature or request question General question dask Relating to the use of Dask labels Aug 3, 2022
@sadielbartholomew
Copy link
Member Author

sadielbartholomew commented Aug 10, 2022

Here's a brain dump to record some further considerations for this raised by, or prompted by questions from, @davidhassell during a quick discussion (feel free to amend or extend David!):

  • the underlying issue here can be motivated not just by making the overall condition simpler to digest on inspection, it will also influence the performance, e.g. <CF Query: [[(lt 9) & (lt 9)] & (lt 9)]> would perform three checks, whereas it only needs one, which could reduce performance unnecessarily and significantly when the data is large;
  • how to simplify non-equal but trivial reductive conditions such as <CF Query: [(lt 9) & (lt 10)]> (i.e. there's more to consider than just something like <CF Query: [(lt 9) & (lt 9)]>)?
  • how to define equality and equivalence for conditions?
    • could and should we have an _equals internal method to strictly define some sensible precise form of condition equality, which can be configured differently if desired by the user-facing equals or equivalent methods?
  • how does laziness have influence here, and how can it be handled?
  • can we enable programmatic generation and simplification of conditions in this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Relating to the use of Dask enhancement New feature or request question General question
Projects
None yet
Development

No branches or pull requests

1 participant