Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silently loads incorrect data when variables only differ by cell_methods #252

Closed
aidanheerdegen opened this issue Jul 23, 2021 · 28 comments · Fixed by #255
Closed

Silently loads incorrect data when variables only differ by cell_methods #252

aidanheerdegen opened this issue Jul 23, 2021 · 28 comments · Fixed by #255
Assignees
Labels

Comments

@aidanheerdegen
Copy link
Collaborator

aidanheerdegen commented Jul 23, 2021

When there are two variables with the same name and frequency but different cell_methods attribute, e.g. different averaging method, the getvar function will silently load quasi-random sections from the two different variables.

As an example:

import cosima_cookbook as cc
import xarray as xr
import matplotlib.colors as colors

session = cc.database.create_session()
'01deg_jra55v140_iaf_cycle2'
lat_slice = slice(-80, -59)
start_time = '1990-01-01'
end_time   = '1990-12-31'
u_ds = cc.querying.getvar(expt, 
                       'u', 
                       session, 
                       start_time=start_time, 
                       end_time=end_time,
                       frequency='1 monthly')
u = u_ds.sel(yu_ocean=lat_slice, time=slice(start_time,end_time)).sel(st_ocean=200, method='nearest')
u.plot(col='time', col_wrap=4, vmin=-0.2, vmax=0.2, cmap='RdBu_r')

creates the following facet plot.

Unknown

Clearly April to June have much lower velocities at 200m depth. This is because this data is from the same variable but with a different averaging (pow02) is being picked up for those 3 months. You can see this in the list of files returned

files = cc.querying._ncfiles_for_variable(expt, 'u', session, start_time=start_time, end_time=end_time, frequency='1 monthly')
[str(f.NCFile.ncfile_path) for f in files]

returns

['/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output371/ocean/ocean-3d-u-1-monthly-mean-ym_1989_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output371/ocean/ocean-3d-u-1-monthly-pow02-ym_1989_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output372/ocean/ocean-3d-u-1-monthly-pow02-ym_1990_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output372/ocean/ocean-3d-u-1-monthly-mean-ym_1990_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output373/ocean/ocean-3d-u-1-monthly-mean-ym_1990_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output373/ocean/ocean-3d-u-1-monthly-pow02-ym_1990_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output374/ocean/ocean-3d-u-1-monthly-pow02-ym_1990_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output374/ocean/ocean-3d-u-1-monthly-mean-ym_1990_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output375/ocean/ocean-3d-u-1-monthly-pow02-ym_1990_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output375/ocean/ocean-3d-u-1-monthly-mean-ym_1990_10.nc']

The immediate solution is to add an ncfile option to the getvar query:

u_ds = cc.querying.getvar(expt, 
                       'u', 
                       session, 
                       start_time=start_time, 
                       end_time=end_time,
                       frequency='1 monthly',
                       ncfile='%monthly-mean-%')

then the code above generates a correct plot:
Unknown-1

This is a serious bug as it is silently ignored. So users can end up loading incorrect data without any error or warning from the Cookbook.

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

ooh, that's a nasty one - thanks for spotting it!

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

should getvar have a default of attrs={'cell_methods': 'time: mean'} ?

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

'cell_methods': 'time: mean' is what users expect to be getting unless they specify something else, and it would also avoid some of the disambiguation workarounds people currently need to use.

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

It might also be wise to assert that all attrs are the same, so it will fail noisily.

@aidanheerdegen
Copy link
Collaborator Author

Yes to all of these things, but currently we haven't added attributes to the querying AFAIK. I too was alarmed that it failed to fail

@aidanheerdegen
Copy link
Collaborator Author

I didn't spot it, Wilma did.

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

On a quick glance it looks like attrs is implemented in getvar.

Not every variable has a cell_methods attribute (e.g. static variables), so something more nuanced than a default of attrs={'cell_methods': 'time: mean'} might be needed.

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

I think I was running to something like that... I don't remember exactly now but somehow I think #212 was related. Just bringing it up in case it helps... if not ignore.

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

Or was it @dhruvbhagtani that was running on something like that... Yes, that's right!

@dhruvbhagtani, I remember you were loading the velocity fields and some months they were all positive because the cookbook was grabbing the pow2 versions of those... Did you ever submitted an issue?

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

This one's also relevant COSIMA/access-om2#234
It looks like the same bug, which I guess I'd assumed was fixed with PR #214, though the issue is still open, so maybe nobody checked...

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

Thanks @aekiss, that's what I had in mind! This was what @dhruvbhagtani was banging his head with for some days.

@aidanheerdegen
Copy link
Collaborator Author

Sorry, I knew we hadn't actually implemented the querying part of it, but I wasn't aware it was causing so many headaches with silent loading of incorrect data.

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

Querying with attrs apparently has been implemented in getvar (see here), so

 cc.querying.getvar(..., attrs={'cell_methods': 'time: mean'})

fixes the problem, e.g.:

import cosima_cookbook as cc
import xarray as xr
import matplotlib.colors as colors

session = cc.database.create_session()
expt = '01deg_jra55v140_iaf_cycle2'
lat_slice = slice(-80, -59)
start_time = '1990-01-01'
end_time   = '1990-12-31'
u_ds = cc.querying.getvar(expt, 
                       'u', 
                       session, 
                       start_time=start_time, 
                       end_time=end_time,
                       frequency='1 monthly',
                       attrs={'cell_methods': 'time: mean'})
u = u_ds.sel(yu_ocean=lat_slice, time=slice(start_time,end_time)).sel(st_ocean=200, method='nearest')
u.plot(col='time', col_wrap=4, vmin=-0.2, vmax=0.2, cmap='RdBu_r')

Screen Shot 2021-07-23 at Fri 23-7 6 20pm

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

Can we make the Exporer suggesting the user to use attrs={'cell_methods': 'time: mean'} when appropriate? Or at least spit a warning that there may be some ambiguity...?

@dhruvbhagtani
Copy link
Member

Yes, I faced this issue sometime back. I then explicitly added the ncfiles as arguments while using getvar().

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

Adding attrs={'cell_methods': 'time: mean'} doesn't work for static fields (or presumably anything lacking cell_methods.

For example, this takes an enormously long time (I guess it's looking through lots of files)

ds = cc.querying.getvar(expt, 'dxu', session, n=1,
                        attrs={'cell_methods': 'time: mean'} )

and then fails with

VariableNotFoundError: No files were found containing 'dxu' in the '01deg_jra55v140_iaf_cycle2' experiment

but if I remove attrs={'cell_methods': 'time: mean'} it works fine.

So the solution is not as simple as having attrs={'cell_methods': 'time: mean'} as a default argument to getvar.

Instead I think the internal logic in _ncfiles_for_variable should include something like this pseudo-python at line 290:

    if ("cell_methods" in v.ncvar_attrs) and not("cell_methods" in attrs):
            q = q.filter(v.ncvar_attrs.any(name="cell_methods",
                                           value="time: mean"))

(I expect 'cell_methods' in v.ncvar_attrs is bad syntax, but you know what I mean)

@aidanheerdegen, @angus-g what do you think?

@aekiss
Copy link
Collaborator

aekiss commented Jul 23, 2021

Also see discussion here: #214 (comment)

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

@aekiss, if this was a respond to my message, just to clarify that I wasn't suggesting attrs={'cell_methods'... to be given by default in getvar. I was only suggesting whether the DataExplorer could somehow sniff out when problems like that would arise and either give a warning to user or, even better, to add the appropriate attrs={'cell_methods'.. in the suggested piece of code it suggests the user to load the variable with.

Anything that would somehow inform users so that they are aware that they won't be getting what they ask for! If we can make sure the users know then this is good enough! Then the users could figure out ways around the issue. Solutions for how the loading could be automatically achieved via the cookbook could come at a later point.

(btw I'm cc-ing @wghuneke on this issue)

@adele-morrison
Copy link

Ouch! I wonder how many existing analyses this may have affected that we didn't know about. A quick check of which IAF diagnostics have multiple cell_methods:
Looks like daily bottom_temp, monthly mld, daily sea_level, daily surface_temp, monthly surface_temp, monthly u, monthly v are the ones we should go back and check if they've been used for analysis.

@navidcy
Copy link
Collaborator

navidcy commented Jul 23, 2021

Ouch! I wonder how many existing analyses this may have affected that we didn't know about. A quick check of which IAF diagnostics have multiple cell_methods:

Looks like daily bottom_temp, monthly mld, daily sea_level, daily surface_temp, monthly surface_temp, monthly u, monthly v are the ones we should go back and check if they've been used for analysis.

Yeap, wouldn't be a bad idea. Hopefully no papers need retraction.

@aekiss
Copy link
Collaborator

aekiss commented Jul 24, 2021

@navidcy, I wasn't responding to your post, but to my own suggestion here.

Warnings etc in the explorer would be helpful but I think it would be safer to have defaults and/or warnings/failsafe in getvar, since people won't necessarily use the explorer for every new variable (e.g. I often just reuse getvar calls and change the variable).

@navidcy
Copy link
Collaborator

navidcy commented Jul 24, 2021

Warnings in both! Definitely.

@aekiss
Copy link
Collaborator

aekiss commented Jul 28, 2021

another link to a related discussion: #137

@aekiss
Copy link
Collaborator

aekiss commented Aug 16, 2021

Hi @aidanheerdegen, this doesn't seem completely fixed.

When I do this with analysis3-21.07

import cosima_cookbook as cc
session = cc.database.create_session()
info = cc.querying._ncfiles_for_variable('01deg_jra55v140_iaf', 'sea_level', session, frequency='1 monthly')
files = [str(f.NCFile.ncfile_path) for f in info]

I get a mix of time-means and snapshots, and no warnings:

files = 
['/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output000/ocean/ocean-2d-sea_level-1-monthly-mean-ym_1958_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output001/ocean/ocean-2d-sea_level-1-monthly-mean-ym_1958_04.nc',
...
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output216/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2012_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output216/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2012_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output217/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2012_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output217/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2012_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output218/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2012_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output218/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2012_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output219/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2012_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output219/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2012_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output220/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2013_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output220/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2013_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output221/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2013_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output221/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2013_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output222/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2013_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output222/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2013_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output223/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2013_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output223/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2013_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output224/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2014_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output224/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2014_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output225/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2014_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output225/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2014_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output226/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2014_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output226/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2014_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output227/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2014_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output227/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2014_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output228/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2015_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output228/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2015_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output229/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2015_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output229/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2015_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output230/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2015_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output230/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2015_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output231/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2015_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output231/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2015_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output232/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2016_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output232/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2016_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output233/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2016_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output233/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2016_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output234/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2016_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output234/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2016_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output235/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2016_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output235/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2016_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output236/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2017_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output236/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2017_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output237/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2017_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output237/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2017_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output238/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2017_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output238/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2017_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output239/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2017_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output239/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2017_11.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output240/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2018_01.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output240/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2018_02.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output241/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2018_04.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output241/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2018_05.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output242/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2018_07.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output242/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2018_08.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output243/ocean/ocean-2d-sea_level-1-monthly-mean-ym_2018_10.nc',
 '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf/output243/ocean/ocean-2d-sea_level-1-monthly-snap-ym_2018_11.nc']

and I need to use

info = cc.querying._ncfiles_for_variable('01deg_jra55v140_iaf', 'sea_level', session, frequency='1 monthly', attrs={'cell_methods': 'time: mean'})

to get only the time-means.

Should this attrs_unique initialisation in cc.querying._ncfiles_for_variable
https://github.com/COSIMA/cosima-cookbook/blob/554931a7/cosima_cookbook/querying.py#L296-L297

    if attrs_unique is None:
        attrs_unique = {}

be replaced with

    if attrs_unique is None:
        attrs_unique = {"cell_methods": "time: mean"}

as in cc.querying.getvar?

@aekiss aekiss reopened this Aug 16, 2021
@aidanheerdegen
Copy link
Collaborator Author

aidanheerdegen commented Aug 16, 2021

Hi Andrew. It's an internal routine (begins with _), so no care is taken in this case as the default is set in the function that calls it.

Am i correct in assuming getvar does the right thing?

Is there a reason you need to use _ncfiles_for_variable directly?

@aekiss
Copy link
Collaborator

aekiss commented Aug 16, 2021

I was just using it to verify the files that are opened, following your code from the first post of this issue.

How would I check getvar does the right thing without using _ncfiles_for_variable?

@aidanheerdegen
Copy link
Collaborator Author

You've verified that it does the right thing because when you pass the default argument it returns the correct file list.

You can call getvar with the same arguments:

In [1]: import cosima_cookbook as cc
   ...: session = cc.database.create_session()

In [2]: cc.querying.getvar('01deg_jra55v140_iaf', 'sea_level', session, frequency='1 monthly')
Out[2]: 
<xarray.DataArray 'sea_level' (time: 732, yt_ocean: 2700, xt_ocean: 3600)>
dask.array<concatenate, shape=(732, 2700, 3600), dtype=float32, chunksize=(1, 540, 720), chunktype=numpy.ndarray>
Coordinates:
  * xt_ocean  (xt_ocean) float64 -279.9 -279.8 -279.7 ... 79.75 79.85 79.95
  * yt_ocean  (yt_ocean) float64 -81.11 -81.07 -81.02 ... 89.89 89.94 89.98
  * time      (time) datetime64[ns] 1958-01-16T12:00:00 ... 2018-12-16T12:00:00
Attributes:
    long_name:      effective sea level (eta_t + patm/(rho0*g)) on T cells
    units:          meter
    valid_range:    [-1000.  1000.]
    cell_methods:   time: mean
    time_avg_info:  average_T1,average_T2,average_DT
    coordinates:    geolon_t geolat_t
    standard_name:  sea_surface_height_above_geoid
    time_bounds:    <xarray.DataArray 'time_bounds' (time: 732, nv: 2)>\ndask...

In [3]: 

but I take your point that it is difficult to determine that this has done the right thing without checking out the fields. Equally the test cases do check that this throws an error as you would expect, so sometimes you either have to dig right in to satisfy yourself that it is correct, or just trust that it is working as expected.

I have thought for a while that it would be good to return a list of files as an attribute to make it clear where the data came from.

@aekiss
Copy link
Collaborator

aekiss commented Aug 16, 2021

Yeah good point re. verification.
I like the idea of returning the file list as an argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants