-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LAMA to Dask: fixes to Query
& Data.digitize
#427
LAMA to Dask: fixes to Query
& Data.digitize
#427
Conversation
Data.where
Data.where
, .digitize
Data.where
, .digitize
Data.where
Data.where
Data.where
, .digitize
@davidhassell I have made updates in line with our discussion from last week, notably your suggestion regarding part (4) (see my opening comment) here, in the two latest commits. As far as I am concerned this is ready to go, so please review when you are back from leave and have time. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sadie - the digitize
and where
changes look good, but I have a question on the Query
changes ...
cf/query.py
Outdated
# Value has no units | ||
value = Data(value, units=units) | ||
else: | ||
if value_units is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this - it doesn't work if value
does not have a Units
attribute:
>>> q = cf.lt(6)
>>> q._value
6
>>> q.set_condition_units('m')
>>> q._value
6 # No units here!
The orginal code does:
>>> q._value
6
>>> q.set_condition_units('m')
>>> q._value
<CF Data(): 6 m> # Seems right to me ...
How was the original code causing problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David, I'm updating this in line with our discussion in the video call today which supersedes your comment here but obviously concerns the same line change and underlying issue.
Back onto this, with apologies for letting it fall off my radar! @davidhassell, to continue the conversation where we left off after the first round of feedback, namely regarding the outstanding piece of feedback #427 (comment)... The last progress made towards this PR was back in August after your initial review where we both had a brief external pair-programming type session to investigate the Notes from discussion
diff --git a/cf/query.py b/cf/query.py
index 67932f9f8..9a38cbffe 100644
--- a/cf/query.py
+++ b/cf/query.py
@@ -788,14 +788,19 @@ class Query:
return
value_units = getattr(value, "Units", None)
+ print("\n%%%%% 0 VALUE UNITS IS", value, value_units)
if value_units is None:
# Value has no units
+ print("\n%%%%% 1.0 IS\n", value, value_units)
value = Data(value, units=units)
+ print("\n%%%%% 1.1 IS\n", value, value_units, value.shape)
else:
# Value already has units
try:
+ print("\n%%%%% 2 IS", value, value_units, value.shape)
value.Units = units
except ValueError:
+ print("\n%%%%% 3 IS", value, value_units, value.shape)
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {value_units!r}" then we see by running test_Data_digitize (__main__.DataTest) ...
%%%%% 0 VALUE UNITS IS [<CF Data(1, 1, 1): [[[0]]]>, <CF Data(1, 1, 1): [[[5]]]>] None
%%%%% 1.0 IS
[<CF Data(1, 1, 1): [[[0]]]>, <CF Data(1, 1, 1): [[[5]]]>] None
%%%%% 1.1 IS
[[[[0, 5]]]] None (2, 1, 1, 1)
ERROR which shows that the
Note it passes, but only if the current Going forwardLet me know if you agree with the above summary of the underlying issue and the suggested solution, notably see the two new commits (one just to revert the previous changes touching the |
cf/query.py
Outdated
@@ -790,7 +791,11 @@ def set_condition_units(self, units): | |||
value_units = getattr(value, "Units", None) | |||
if value_units is None: | |||
# Value has no units | |||
value = Data(value, units=units) | |||
if isinstance(value, Iterable): # may be a sequence of Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit Hi Sadie, I think what I wrote here may well be nonsense! Please ignore, and the code snippet and I'll have another think ...
Hi Sadie, not sure this will work in all cases:
Data
objects areIterable
Whenvalue
is a sequence, we want it to remain a sequence, not convert it into aData
object, andData.concatenate
doesn't work for non-Data objects
This code might make sense, but it's not very pretty!
IGNORE
value_units = getattr(value, "Units", None)
if value_units is None:
# Value has no units
if isinstance(value, Iterable) and not isinstance(value, str):
new = []
for v in value:
value_units = getattr(v, "Units", None)
if value_units is None:
# Value has no units
v = Data(v, units=units)
else:
# Value already has units
try:
v.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {value_units!r}"
)
new.append(v)
value = new
else:
value = Data(value, units=units)
else:
# Value already has units
try:
value.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {value_units!r}"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I think I worked out what's going on. The key is that the contents of a "set", "wi" or "wo" iterable are not necessarily Data
objects (or rather objects with Units
). How about this? There's scope for a bit of code re-use here, but I can't quite see how, yet.
value_units = getattr(value, "Units", None)
if value_units is None:
# Value has no units
if self.operator in ("wi", "wo", "set"):
# value is a sequence of things that may or may not
# already have units
new = []
for v in value:
v_units = getattr(v, "Units", None)
if v_units is None:
v = Data(v, units=units)
else:
try:
v = v.copy()
v.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {v_units!r}"
)
new.append(v)
value = new
else:
value = Data(value, units=units)
else:
# Value already has units
try:
value = value.copy()
value.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {value_units!r}"
)
self._value = value
We could do with some more units test for this, too ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added your code (putting you as a formal co-author since you wrote it, though I don't think I can specify you as 'full' author since I'm committing it!) and then consolidated it with an inner helper function. I'll add some new testing first thing tomorrow.
Hi Sadie - I think we have a bit of an issue here, as some of this code has already been modified in #464 - could you merge |
Hi David, no worries. I've done as you ask but it seems to have merged cleanly, and upon investigation it looks like the one commit where I touched Overall, with my new merge commit, the
in case changes are necessary with regards to that. Thanks! |
Thanks, Sadie. I'm now going to stop dipping in to this piecemeal and do a proper review later today/tomorrow, Sorry for any extra confusion I have wrought! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sadie, I've made a suggestion for the changes to Query
(and a request for some more tests), otherwise all good. A small amount of code caused a lot of thought, here!
cf/query.py
Outdated
@@ -790,7 +791,11 @@ def set_condition_units(self, units): | |||
value_units = getattr(value, "Units", None) | |||
if value_units is None: | |||
# Value has no units | |||
value = Data(value, units=units) | |||
if isinstance(value, Iterable): # may be a sequence of Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I think I worked out what's going on. The key is that the contents of a "set", "wi" or "wo" iterable are not necessarily Data
objects (or rather objects with Units
). How about this? There's scope for a bit of code re-use here, but I can't quite see how, yet.
value_units = getattr(value, "Units", None)
if value_units is None:
# Value has no units
if self.operator in ("wi", "wo", "set"):
# value is a sequence of things that may or may not
# already have units
new = []
for v in value:
v_units = getattr(v, "Units", None)
if v_units is None:
v = Data(v, units=units)
else:
try:
v = v.copy()
v.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {v_units!r}"
)
new.append(v)
value = new
else:
value = Data(value, units=units)
else:
# Value already has units
try:
value = value.copy()
value.Units = units
except ValueError:
raise ValueError(
f"Units {units!r} are not equivalent to "
f"query condition units {value_units!r}"
)
self._value = value
We could do with some more units test for this, too ...
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Aha, that makes so much sense now you have worked it out and summarised it! And the change block you suggest follows on sensibly from that. I'll get the equivalent code updated in line with your suggestion. Thanks!
Agreed. Right-o, I'll increase the coverage in this respect as part of this PR. I'll update it shortly and tag you to let you know we're ready for a re-review. |
Data.where
, .digitize
Query
& Data.digitize
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
25a2184
to
33c9c21
Compare
Sorry @davidhassell before I went on leave I think I forgot to confirm that this is ready for re-review now after I updated this in line with your suggestion and made some consolidations too. Please let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sadie - A nice solution, thanks. I've made a couple of comments, but go ahead and merge when you're ready.
Out-of-date now, later comments made and raise an approval.
Thanks for all of your very useful feedback on this @davidhassell. Final round of feedback addressed in my latest commit, so I'll merge as advised and open up one Issue as discussed. |
Reinstate a commented-out test case, now ready since pre-requisites have been satisfied, that calls
where
and testsdigitize
and in doing so reveals a bug in the migrated state of both (:flushed:), which this PR also fixes. See below for the breakdown.(The fixes applied may not be the most appropriate course of action, or need tweaking, so I've applied them in self-contained commits which can be reverted if necessary. The most important thing here my outline of the issues arising below, which should be addressed by some means.)
Details
Overall this PR:
Reinstates the test case in question, as-was, in f86923f.
Fixes the first issue encountered when reinstating, namely that
NoneType
errors were being hit:ultimately due to rogue assignment to
set_condition_units
which returns nothing, such that it seems highly likely that calling without assigning was the intended action, and a fix has been applied as such in e450a40.Fixes the next issue revealed in the stack trace, this one occurring because
Query.set_condition_units
was introducing an extra dimension for unitless values which was manifesting as:with a sensible fix applied in 1aec661 (though in this case something else may be required in the unitless case, so not sure if straight-up removing the line is suitable as the general fix, this is to be discussed...).
Fixes the final issue revealed by the reinstated test, in 5bb5766, specifically that the core assertion checking equality between the intended and actual results was failing:
because (after much investigation into what was actually going wrong once that key assertion was finally reached!) the
Data.digitize
method itself created a list calleddelete_bins
that was not used later in the code to do anything that had an effect on the outcome, when clearly it was designed to do something...I cross-referenced the migration PR (dask:
Dask.digitize
#312) and oldmaster
code after which it seemed that the following block had not been incorporated:cf-python/cf/data/data.py
Lines 2647 to 2652 in 07cb04f
so I introduced it and adapted it to the new way, and this change did indeed (finally!) get the test case to pass, without causing side effects elsewhere.
So I think (4) concludes the multi-fix, but let me know what you think, @davidhassell! Thanks.