[Feature]: Have NCO detect all errors on first run #535

forsyth2 · 2023-11-29T17:39:25Z

How will this affect the next version number?

Small improvement (increment PATCH version)

Is your feature request related to a problem?

When running the climo task on multiple variables, only the first error is reported. I.e., in #533, it took 16 runs to find out 14 variables weren't supported on climo whereas it only took 1 run to find out 10 variables weren't supported on ts.

Describe the solution you'd like

The climo task should report all errors in variables rather than simply stopping after encountering one error.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

chengzhuzhang · 2023-11-29T18:42:19Z

One potential solution is to use the fault tolerant variable list, suggested by @czender
as mentioned here #533 (comment).

Or maybe @czender can advice if there are alternatives here.

forsyth2 · 2023-11-29T18:55:16Z

Yes the ^var$ solution may be best. But my understanding is that then the climo and ts tasks will silently fail -- i.e. it will look like they will pass but they're really missing variables. If that's ok, I suppose I could change the code to automatically add those prefix and suffix character to every variable in the list.

czender · 2023-11-29T19:40:47Z

I am coming up to speed on what you are discussing. After reading #533, my impression is that timeseries generation is working as desired (i.e., failing quickly when non-present variables are requested), and that climo generation is not because it does not fail quickly enough when non-present variables are requested. Is this the essence of the problem? If not, please elaborate.

Also, why are these tasks being invoked with non-present variable names in the first place?

Once I understand both of these issues, I can be more helpful :)

forsyth2 · 2023-11-29T20:20:48Z

Is this the essence of the problem?

Yes, that's correct.

why are these tasks being invoked with non-present variable names in the first place?

zppy was largely developed with Water Cycle in mind. We are now trying to provide Land support. This requires a completely different set of variables than what we have typically developed/tested zppy with.

czender · 2023-11-29T20:46:26Z

OK. Say ncclimo, in climo mode, is invoked with -v present1,notpresent1,present2,notpresent2,.... What is the desired behavior? to fail when processing (e.g., averaging) notpresent1? or to fail before processing notpresent1? or...? And did you know that invoking ncclimo without -v varlist causes it to climatologize all variables in the file, thus obviating the need to manage ever-changing lists of variables in zppy? Is that a potential solution? (It can also be invoked to exclude a list of specified variables with -x -v varlist)

forsyth2 · 2023-11-29T20:53:33Z

What is the desired behavior?

I'm thinking it would process all available variables and then fail listing "notpresent1, notpresent2 not found". Currently, it would fail with "notpresent1 not found" and the user would have no idea that notpresent2 is also not present.

climatologize all variables in the file

I believe the reason zppy supports a specific vars list is that users usually don't want or need every single variable from a file.

czender · 2023-11-29T21:10:46Z

It is possible to implement this desired behavior with a change to ncclimo. I could interrogate the first file to see whether the varlist contains any non-present variables, and if it does, to remove those from the list, then proceed with the climatology generation with only the present variables, then fail at the end (after climatologizing the present variables), with a list of the non-present variables. Is that what y'all want? Or perhaps you want it to succeed and print only an informational message that some of the requested variables were not present? I would be happy to implement it either way.

Note that comparing the requested varlist to the contents of the first file and making adjustments to varlist based on the results could be done in zppy instead of ncclimo. Have you considered whether this functionality belongs there instead?

chengzhuzhang · 2023-11-29T21:41:18Z

I think for monthly climatology, it is not a bad idea to just include all variables (as default), especially for v3, the output variables are much reduced.

forsyth2 · 2023-11-29T22:18:28Z

not a bad idea to just include all variables (as default)

Alright, we can do that as a default. But what do we want the behavior/implementation to be when users specifically set vars?

czender · 2023-11-29T22:30:07Z

I leave the desired behavior up to you two. FWIW, my "philosophy" is that it's more dangerous to fix user input errors than it is to just fail so that users are forced to learn what's in their datasets. Otherwise they'll specify non-present variables and then be surprised when those variables to not appear in successful climatologies. I think we all want to avoid that situation, that Ryan called silently ignoring bad input. This is a grey area, though, in that I think Ryan's original point was that it's currently too difficult for users to learn what's in the dataset. So I agree that some form of automated varlist checker may be useful. Just LMK if you want me to alter ncclimo to do that.

chengzhuzhang · 2023-11-29T22:43:01Z

not a bad idea to just include all variables (as default)

Alright, we can do that as a default. But what do we want the behavior/implementation to be when users specifically set vars?

Good question, if a user wants to get the specific set of vars instead of all the variables, I would assume they knows the dataset well...In any case i think a varlist checker will be useful. And it would be good to have it in zppy so that it can be used as needed in different places.

rljacob · 2023-11-30T03:01:33Z

I agree the logic to check what variables zppy hands ncclimo should live in zppy.

forsyth2 · 2023-12-01T16:50:05Z

I could interrogate the first file to see whether the varlist contains any non-present variables

@czender What would be the command to do this? If that's something zppy can call externally, then of course no change would be needed for ncclimo itself.

it's more dangerous to fix user input errors than it is to just fail so that users are forced to learn what's in their datasets.

I agree. That's why I think the command should still fail, but with a list of all failures instead of just the first one encountered.

I agree the logic to check what variables zppy hands ncclimo should live in zppy.

Yes, I think that's fine. I just need to know how to check that without actually attempting to generate the climatologies first. (See first point in this post).

czender · 2023-12-01T18:54:58Z

When users do not provide an explicit varlist, ncclimo creates one internally using an NCO call to return the names of all variables (except coordinates) with rank >= 2. In my experience, this is sufficient since rank == 0,1 variables are not geospatial. zppy could produce that list and then fail, informatively, when the user-specified list is not a subset of the rank >= 2 list. Something like this is probably optimal because it only requires opening the input file once. Alternatively, zppy could probe the file once for each variable in the user varlist and track the variables for which that fails, then print a list.

Examples:

zender@spectral:~$ ncks --lst_rnk_ge2 /Users/zender/data/bm/elmv2_ne30pg2l15.nc
BCDEP,BTRAN,BUILDHEAT,DEFICIT,DSTDEP,DSTFLXT,DWB,EFLX_DYNBAL,EFLX_GRND_LAKE,EFLX_LH_TOT,EFLX_LH_TOT_R,EFLX_LH_TOT_U,ELAI,ERRH2O,ERRH2OSNO,ERRSEB,ERRSOI,ERRSOL,ESAI,FCEV,FCOV,FCTR,FGEV,FGR,FGR12,FGR_R,FGR_U,FH2OSFC,FIRA,FIRA_R,FIRA_U,FIRE,FIRE_R,FIRE_U,FLDS,FPSN,FPSN_WC,FPSN_WJ,FPSN_WP,FROST_TABLE,FSA,FSAT,FSA_R,FSA_U,FSDS,FSDSND,FSDSNDLN,FSDSNI,FSDSVD,FSDSVDLN,FSDSVI,FSDSVILN,FSH,FSH_G,FSH_NODYNLNDUSE,FSH_R,FSH_U,FSH_V,FSM,FSM_R,FSM_U,FSNO,FSNO_EFF,FSR,FSRND,FSRNDLN,FSRNI,FSRVD,FSRVDLN,FSRVI,GC_HEAT1,GC_ICE1,GC_LIQ1,H2OCAN,H2OSFC,H2OSNO,H2OSNO_TOP,H2OSOI,HC,HCSOI,HEAT_FROM_AC,INT_SNOW,LAISHA,LAISUN,LAKEICEFRAC,LAKEICETHICK,OCDEP,PARVEGLN,PBOT,PCO2,PCT_LANDUNIT,PCT_NAT_PFT,Q2M,QBOT,QCHARGE,QDRAI,QDRAI_PERCH,QDRAI_XS,QDRIP,QFLOOD,QFLX_ICE_DYNBAL,QFLX_LIQ_DYNBAL,QH2OSFC,QINFL,QINTR,QIRRIG_GRND,QIRRIG_ORIG,QIRRIG_REAL,QIRRIG_SURF,QIRRIG_WM,QOVER,QRGWL,QRUNOFF,QRUNOFF_NODYNLNDUSE,QRUNOFF_R,QRUNOFF_U,QSNOMELT,QSNWCPICE,QSNWCPICE_NODYNLNDUSE,QSOIL,QVEGE,QVEGT,RAIN,RH2M,RH2M_R,RH2M_U,SABG,SABG_PEN,SABV,SNOBCMCL,SNOBCMSL,SNODSTMCL,SNODSTMSL,SNOINTABS,SNOOCMCL,SNOOCMSL,SNOW,SNOWDP,SNOWICE,SNOWLIQ,SNOW_DEPTH,SNOW_SINKS,SNOW_SOURCES,SOILICE,SOILICE_ICE,SOILLIQ,SOILLIQ_ICE,SOILWATER_10CM,SUPPLY,SoilAlpha,SoilAlpha_U,TAUX,TAUY,TBOT,TBUILD,TG,TG_R,TG_U,TH2OSFC,THBOT,TKE1,TLAI,TLAKE,TREFMNAV,TREFMNAV_R,TREFMNAV_U,TREFMXAV,TREFMXAV_R,TREFMXAV_U,TSA,TSAI,TSA_R,TSA_U,TSOI,TSOI_10CM,TSOI_ICE,TV,TWS,TWS_MONTH_BEGIN,TWS_MONTH_END,U10,URBAN_AC,URBAN_HEAT,VOLR,VOLRMCH,WA,WASTEHEAT,WIND,ZBOT,ZWT,ZWT_PERCH
zender@spectral:~$ ncks -m -v foobar /Users/zender/data/bm/elmv2_ne30pg2l15.nc
ncks: ERROR nco_xtr_mk() reports user-supplied variable name or regular expression 'foobar' is not in and/or does not match contents of input file
Abort trap: 6
zender@spectral:~$ echo $?
134
zender@spectral:~$

rljacob · 2023-12-01T19:49:53Z

Isn't zppy written in python? Just open the file with python netcdf lib and query for the variables. Don't rely on external programs from NCO.

xylar · 2023-12-01T20:41:52Z

I couldn't agree more. This is something for xarray.

czender · 2023-12-01T20:59:45Z

Just remember that a valid varlist for ncclimo contains only variables that can be climatologized, i.e., rank >= 2, with at least one spatial and exactly one temporal dimension (i.e., no multi-dimensional auxiliary coordinate variables or bounds variables).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Have NCO detect all errors on first run #535

[Feature]: Have NCO detect all errors on first run #535

forsyth2 commented Nov 29, 2023

chengzhuzhang commented Nov 29, 2023

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

forsyth2 commented Nov 29, 2023 •

edited

Loading

czender commented Nov 29, 2023

chengzhuzhang commented Nov 29, 2023 •

edited

Loading

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

chengzhuzhang commented Nov 29, 2023

rljacob commented Nov 30, 2023

forsyth2 commented Dec 1, 2023

czender commented Dec 1, 2023

rljacob commented Dec 1, 2023

xylar commented Dec 1, 2023

czender commented Dec 1, 2023

[Feature]: Have NCO detect all errors on first run #535

[Feature]: Have NCO detect all errors on first run #535

Comments

forsyth2 commented Nov 29, 2023

How will this affect the next version number?

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

chengzhuzhang commented Nov 29, 2023

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

forsyth2 commented Nov 29, 2023 • edited Loading

czender commented Nov 29, 2023

chengzhuzhang commented Nov 29, 2023 • edited Loading

forsyth2 commented Nov 29, 2023

czender commented Nov 29, 2023

chengzhuzhang commented Nov 29, 2023

rljacob commented Nov 30, 2023

forsyth2 commented Dec 1, 2023

czender commented Dec 1, 2023

rljacob commented Dec 1, 2023

xylar commented Dec 1, 2023

czender commented Dec 1, 2023

forsyth2 commented Nov 29, 2023 •

edited

Loading

chengzhuzhang commented Nov 29, 2023 •

edited

Loading