Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More support for pop data #240

Merged
merged 5 commits into from
Mar 4, 2021
Merged

More support for pop data #240

merged 5 commits into from
Mar 4, 2021

Conversation

allibco
Copy link
Collaborator

@allibco allibco commented Mar 1, 2021

This fixes a number of issues for POP data. Now we are very close to having everything working!

@andersy005 - could you help me with one bit that is broken? It is demonstrated in the last cell in the POPData notebook. Basically the issue is when I grab POP data with a fixed lat and lon, then the TLAT and TLON coords no longer have dimensions (nlat, nlon). This happens in the subset_data() in util.py - I added a comment at the end of that function. I spent several hours on Friday trying to figure out how to add the dimensions back to these coordinates, but to no avail :). Any help is appreciated. (Note that the CAM data is unaffected as the lat and lon are one dimensional, so I can grab them independently with isel(). But for POP I had to figure out which grid point was closet from the 2D lat and lon data)

@allibco allibco requested review from pinarda and andersy005 March 1, 2021 22:52
@andersy005
Copy link
Contributor

is happens in the subset_data() in util.py - I added a comment at the end of that function. I spent several hours on Friday trying to figure out how to add the dimensions back to these coordinates,

I don't know to address this issue for (2D) dimension coordinates unfortunately. Ccing @dcherian who may have ideas/recommendations

Copy link

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: ds_subset = ds_subset.where(ds[time_dim_name].dt.season == 'SON', drop=True)

it's a lot more efficient to do ds_subset.cf.sel(time=ds.cf["time"].dt.season == "SON")

mlon = ds_subset[lon_coord_name].compute()
# euclidean dist for now....
di = np.sqrt(np.square(ad_lon - mlon) + np.square(lat - mlat))
index = np.where(di == np.min(di))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index = np.where(di == np.min(di))
index = np.argmin(di)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know - thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried this suggestion, but don't understand the error. Using the simple example from below again:

import numpy as np
import pandas as pd
import xarray as xr
mylat = np.array([[1., 2., 3., 4.], [1.1, 2.1, 3.1, 3.9], [1, 1.9, 2.9, 3.9]])
mylon = np.array([[3., 4., 5., 6.], [3.1, 4.1, 5.1, 5.9], [3, 3.9, 4.9, 5.9]])
mytimes = pd.date_range('2000-01-01', periods=5)
pop_data = np.random.uniform(low=0.0, high=100.0, size=(3,4,5))
ds_subset = xr.DataArray(
    pop_data, 
     coords={
        "time": mytimes,
        "TLAT": (("nlat", "nlon"), mylat, {'standard_name': 'latitude', 'units': 'degrees_north'}),
        "TLON": (("nlat", "nlon"), mylon, {'standard_name': 'longitude', 'units': 'degrees_east'})
    },
    dims=['nlat', 'nlon', 'time'] , attrs={'long_name': 'Surface Potential'}
)

Then I do this calculation:

mlat = ds_subset["TLAT"].compute()
mlon = ds_subset["TLON"].compute()
ad_lat = 3.1
ad_lon = 4
di = np.sqrt(np.square(ad_lon - mlon) + np.square(ad_lat - mlat))
index = np.argmin(di)

Then I get this error:
"ValueError: dimensions ('nlat', 'nlon') must have the same length as the number of data dimensions, ndim=0 "
(doing 'index = np.where(di == np.min(di))' does not give an error)

@dcherian
Copy link

dcherian commented Mar 2, 2021

I can't tell; can you add a simple example please?

@allibco
Copy link
Collaborator Author

allibco commented Mar 3, 2021

I can't tell; can you add a simple example please?

Yes - I will make simple example tomorrow - thanks for your help.

@allibco
Copy link
Collaborator Author

allibco commented Mar 3, 2021

@dcherian Here is my simple example:

import numpy as np
import pandas as pd
import xarray as xr
mylat = np.array([[1., 2., 3., 4.], [1.1, 2.1, 3.1, 3.9], [1, 1.9, 2.9, 3.9]])
mylon = np.array([[3., 4., 5., 6.], [3.1, 4.1, 5.1, 5.9], [3, 3.9, 4.9, 5.9]])
mytimes = pd.date_range('2000-01-01', periods=5)
pop_data = np.random.uniform(low=0.0, high=100.0, size=(3,4,5))
ds_subset = xr.DataArray(
    pop_data, 
     coords={
        "time": mytimes,
        "TLAT": (("nlat", "nlon"), mylat, {'standard_name': 'latitude', 'units': 'degrees_north'}),
        "TLON": (("nlat", "nlon"), mylon, {'standard_name': 'longitude', 'units': 'degrees_east'})
    },
    dims=['nlat', 'nlon', 'time'] , attrs={'long_name': 'Surface Potential'}
)
ds_subset

pop0

So in the above, the coords TLAT and TLON have dims (nlat, nlon). Now I do some calculations and want a the data just a a single grid point:

dd = ds_subset.cf['latitude'].dims
lat_dim_name = dd[0]
lon_dim_name = dd[1]
xmin = 1
ymin = 0
ds_singlept = ds_subset.isel({lat_dim_name: xmin, lon_dim_name: ymin})
ds_singlept

pop

Now the dims (nlat, nlon) are missing from the TLAT and TLON coordinates. My question is how can I put them back (I need them for a further calculation).

@dcherian
Copy link

dcherian commented Mar 3, 2021

Use
ds_singlept = ds_subset.isel({lat_dim_name: [xmin], lon_dim_name:[ymin]})

The list preserves the dimension.

@allibco
Copy link
Collaborator Author

allibco commented Mar 3, 2021

Thanks so much. I can't tell you how long I spent trying to figure that out....

lon_dim_name = ds_subset.cf['longitude'].dims[0]
elif latdim == 2:
lat_dim_name = dd[0]
lon_dim_name = dd[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@dcherian
Copy link

dcherian commented Mar 3, 2021

Thanks so much. I can't tell you how long I spent trying to figure that out....

:((((( it isn't even documented, we need some kind of indexing cheatsheet.

@allibco
Copy link
Collaborator Author

allibco commented Mar 3, 2021

I think I have all the POP modifications accounted for now except for adding testing specific for pop data and fixing the contrast variance to work with pop (issues #230 and #231). - I'll tackle these soon :)

@allibco allibco merged commit e16a76f into main Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants