-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle land mask #40
Handle land mask #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @paigem. This will be really useful! I had a few nitpicks and I think you should add some wording to the docstrings, but after that this seems good to go for me.
It might be helpful if @rabernat could look at the 'shrink'/'unshrink' logic and see if there are performance improvements that we have not considered? Maybe we should profile the memory use of this? |
I did some quick profiling to compare runtime comparing with and without the mask. This preliminary comparison shows a speed up of about 20% by masking land values. This is not a complete comparison. The following assumptions are made:
Snakeviz visualizationsNo mask, CM2.6-sized test data:data = create_data((3600, 2700, 2), chunks=None)
%snakeviz out_data = noskin_nomask(*data, 'ecmwf', 2, 10, 6) We can see that the numpy wrapper With mask, CM2.6-sized test datadata = create_data((3600, 2700, 2), chunks=None, land_mask=True)
%snakeviz out_data = noskin(*data, 'ecmwf', 2, 10, 6) The small rectangles in the bottom right are the extra work needed to shrink and unshrink the arrays. Even with these extra computations, the runtime is noticeably shorter. Runtime comparison# Take average runtime across 3 runs
# No mask
nomask_avg = np.mean((60.467731952667236,61.46433997154236,61.295777797698975))
# Mask
mask_avg = np.mean((49.637826919555664,46.95607113838196,48.179039001464844))
total_percent_faster = (nomask_avg - mask_avg)/nomask_avg
total_percent_faster Result: 0.2098748237283278 --> ~20% faster! As @jbusecke stated above, we may also want to profile memory usage. |
Thats amazing @paigem! Thanks for the nice comparison. |
I had one more thing, that might be useful to change here. I believe now that we are converting the input to a 1d array in any case, we can get rid of this wrapper entirely. To check that this is correct, I would write a test that puts arrays of various dimensions (1-4D?) through the xarray wrappers and make sure that this does not lead to crashes. |
@jbusecke Take a look at the new test I wrote in this commit to verify that our xarray wrapper takes arrays of size 2d and greater (i.e. no longer just 3d!). It seems to work fine, but I wasn't sure how to write a test for that... |
tuple(func(*d, "coare3p0", 2, 10, 6) for d in data) | ||
assert ( | ||
1 == 1 | ||
) # This line is always true, but verifies that the above line doesn't crash the Fortran code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I get this. If the fortran code crashes, this will never be called? I would remove it. We just need to make sure that we check the CI actions carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, I was under the impression we needed some sort of "check" (eg an assert statement). I can remove the unnecessary assert line.
def test_all_input_array_sizes_valid(skin_correction): | ||
shapes = ( | ||
(3, 4), | ||
(2, 3, 4), | ||
(2, 3, 4, 5), | ||
) # create_data() only allows for inputs of 2 or more dimensions | ||
data = (create_data(s, skin_correction=skin_correction) for s in shapes) | ||
if skin_correction: | ||
func = skin | ||
else: | ||
func = noskin | ||
tuple(func(*d, "coare3p0", 2, 10, 6) for d in data) | ||
assert ( | ||
1 == 1 | ||
) # This line is always true, but verifies that the above line doesn't crash the Fortran code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_all_input_array_sizes_valid(skin_correction): | |
shapes = ( | |
(3, 4), | |
(2, 3, 4), | |
(2, 3, 4, 5), | |
) # create_data() only allows for inputs of 2 or more dimensions | |
data = (create_data(s, skin_correction=skin_correction) for s in shapes) | |
if skin_correction: | |
func = skin | |
else: | |
func = noskin | |
tuple(func(*d, "coare3p0", 2, 10, 6) for d in data) | |
assert ( | |
1 == 1 | |
) # This line is always true, but verifies that the above line doesn't crash the Fortran code | |
@pytest.mark.parametrize('shape', [(3, 4), (2, 3, 4), (2, 3, 4, 5),]) | |
def test_all_input_array_sizes_valid(skin_correction, shape): | |
# create_data() only allows for inputs of 2 or more dimensions | |
data = create_data(shape, skin_correction=skin_correction) | |
if skin_correction: | |
func = skin | |
else: | |
func = noskin | |
func(*data, "coare3p0", 2, 10, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did here is factor out the shape as a parameterized input, so that each shape gets its own test. This enables a more fine grained control (e.g. if for some reason the 4d case fails, but the others pass we will see this immediately in the test report).
I made a few minor suggestions. If you agree with those, you can commit and merge once the tests pass (make sure to double check that the % line goes to 100 in the CI log). |
Co-authored-by: Julius Busecke <julius@ldeo.columbia.edu>
for more information, see https://pre-commit.ci
This PR does the following:
NaN
)Additionally:
threadsafe
for xarray, skin computations (see Attempt to fixthreadsafe
onskin
#39 for more discussion on ways around this)create_data()
from Bugfix for test data #37test_flux_np.py
create_test_data.py
Closes #31