Query module: reduce equivalent encapsulated conditions #430

sadielbartholomew · 2022-08-03T15:50:13Z

Whilst investigating #427 I noticed that trivial equal and/or equivalent conditions in a complex condition are listed and considered separately. E.g, using a clean as-is lama-to-dask branch (hence including this in Dask/post-Dask scope of work), I see:

>>> import cf
>>> # Generating a trivially-redundant query condition:
>>> q = cf.lt(9)
>>> c = q & q & q
>>> c
<CF Query: [[(lt 9) & (lt 9)] & (lt 9)]>
>>> # Same goes with different objects but equal queries:
>>> p = cf.lt(9)
>>> q.equals(p)
True
>>> c = q & p
>>> c
<CF Query: [(lt 9) & (lt 9)]>
>>> # Now with unit considerations to generate some equivalency:
>>> r = cf.lt(9000)
>>> q.set_condition_units('km')
>>> r.set_condition_units('m')
>>> q
<CF Query: (lt 9.0 km)>
>>> r
<CF Query: (lt 9000 m)>
>>> q.equals(r)
False
>>> q.equivalent(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sadie/cf-python/cf/query.py", line 822, in equivalent
    _DEPRECATION_ERROR_FUNCTION(self, "equivalent")
  File "/home/sadie/cf-python/cf/functions.py", line 3205, in _DEPRECATION_ERROR_FUNCTION
    raise DeprecationError(
cf.functions.DeprecationError: Function <CF Query: (lt 9.0 km)> has been deprecated at version 3.0.0 and is no longer available and will be removed at version 4.0.0. equivalent
>>> # Not sure why equivalent has been deprecated here? What can I use instead?
>>> s = q & r
>>> s
<CF Query: [(lt 9.0 km) & (lt 9000 m)]>

(the same applies on the master branch, essentially, but set_condition_units is a method new to the lama-to-dask branch so the snippet won't work in that case and in that sense). @davidhassell note also the question in the final comment!

Whilst it is unlikely that a user would directly end up having equivalent conditions inside their query, perhaps through accumulation of a combination of complex conditions it might occur and not be obvious, so may not be too uncommon - regardless, I think we should, ideally, simplify such conditions as they are built up, e.g. taking the above example c = q & q & q would become c = q via c = (q & q) & q -> c = q & q, with more complex expressions being reduced piece-by-piece in the same way.

Note I am not suggesting we pre-evaluate the queries themselves, only checking against the logical equivalency of the conditions encapsulated, as the query is built.

Is this a good idea, conceptually? Also I imagine it is possible without much if any boilerplate code, at least, but if it does require a lot then, even if conceptually desired, maybe it should not be applied.

(This is not an urgent or ground-breaking enhancement, but something that can be improved somewhat.)

The text was updated successfully, but these errors were encountered:

sadielbartholomew · 2022-08-10T11:23:19Z

Here's a brain dump to record some further considerations for this raised by, or prompted by questions from, @davidhassell during a quick discussion (feel free to amend or extend David!):

the underlying issue here can be motivated not just by making the overall condition simpler to digest on inspection, it will also influence the performance, e.g. <CF Query: [[(lt 9) & (lt 9)] & (lt 9)]> would perform three checks, whereas it only needs one, which could reduce performance unnecessarily and significantly when the data is large;
how to simplify non-equal but trivial reductive conditions such as <CF Query: [(lt 9) & (lt 10)]> (i.e. there's more to consider than just something like <CF Query: [(lt 9) & (lt 9)]>)?
how to define equality and equivalence for conditions?
- could and should we have an _equals internal method to strictly define some sensible precise form of condition equality, which can be configured differently if desired by the user-facing equals or equivalent methods?
how does laziness have influence here, and how can it be handled?
can we enable programmatic generation and simplification of conditions in this way?

sadielbartholomew added enhancement New feature or request question General question dask Relating to the use of Dask labels Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query module: reduce equivalent encapsulated conditions #430

Query module: reduce equivalent encapsulated conditions #430

sadielbartholomew commented Aug 3, 2022

sadielbartholomew commented Aug 10, 2022 •

edited

Loading

Query module: reduce equivalent encapsulated conditions #430

Query module: reduce equivalent encapsulated conditions #430

Comments

sadielbartholomew commented Aug 3, 2022

sadielbartholomew commented Aug 10, 2022 • edited Loading

sadielbartholomew commented Aug 10, 2022 •

edited

Loading