-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flexibility to model groups #64
Comments
Thanks for these suggestions. I wanted to make sure I understand the problem you are seeing and am not as familiar with atmosphere data. The issue is that sometimes you search for a variable and it is replicated (why oh why did we do that?) in several tables? For example, I see In [3]: cat.search(experiment_id="historical",variable_id="hfls",frequency="mon",source_id="CESM2",member_id="r1i1p1f1")
Searching indices: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████|2/2 [ 1.92index/s]
Out[3]:
Summary information for 3 results:
institution_id [NCAR]
experiment_id [historical]
member_id [r1i1p1f1]
mip_era [CMIP6]
grid_label [gn]
table_id [ImonGre, Amon, ImonAnt]
activity_drs [CMIP]
variable_id [hfls]
source_id [CESM2]
project [CMIP6] If that isn't the issue, can you put together a search for me which explains the issue more clearly? |
One clear example of what I am concerned about: from intake_esgf import ESGFCatalog
cat = ESGFCatalog()
cat.search(experiment_id="historical", frequency="mon", variable_id=["tos", "tas"], source_id="CIESM", member_id="r1i1p1f1")
cat.model_groups() The
In the example above you might also run into issue. You might want to search for data within one table (e.g., |
Perfect! Thanks, having these clear stories helps immensely. I designed the tooling for the analysis that I was used to but it needs to address everyone's problems. I will give this some thought and ping you again when I have a better idea. |
Is your feature request related to a problem? Please describe.
Model groups currently group by
source_id
,member_id
, andgrid_label
by default. Users might want different groupings. For example, if you wanted to analyze bothtas
andtos
(e.g., to create a blended 2m temperature and SST dataset), these variables are often placed into agn
andgr
group (resulting in two separate groupings for these variables). This would make it hard to apply built-in tools (e.g.,.remove_incomplete
) to check if a model realization has bothtas
andtos
. A user might also want to group by other sets (e.g.,source_id
,member_id
, andexperiment_id
).Describe the solution you'd like
One possible solution would be to allow the user to define groups of interest when calling
model_groups
, e.g.,cat.model_groups(groupby=['source_id', 'member_id'])
.Describe alternatives you've considered
This issue arose from a preview of
intake_esgf
so I am not yet a user and do not know what workarounds might be available.Additional context
Another related issue is what happens if you have multiple datasets of the same variable in a group (e.g.,
ta
for a given model in theAERmonZ
table and theCMIP
table [39 versus 19 vertical levels, I think]). Can you drop datasets within a group? Or is this de-duplicated on the initial search? If you can't drop duplicate datasets within a group, an alternate approach might be to take two catalog searches, merge them, and then group them by user-defined groupings. I'm not sure if this makes technical sense, but I foresee a general challenge in trying to get groupings to work across some facets.The text was updated successfully, but these errors were encountered: