User defined function arguments #65

brianpm · 2021-11-15T18:58:04Z

brianpm
Nov 15, 2021
Collaborator

I am working on an example of adding a user-defined plot. To do it, I put a new script into the scripts/plotting. Inside that script is the function that should get called. It should be able to use arguments as specified in in the yaml file. This is sort of working, but I'm running into an issue now that the code has a few classes that seem to obscure the configuration (i.e., yaml) information. What is the right way to have a user specify arguments, and how should AdfDiag get them?

For example, in the function I am trying to implement I want to provide it with the case names, locations of the climo files, and the output plots directory. That is to say, this info from the yaml file:

diag_cam_climo[cam_case_name]
diag_cam_climo[cam_climo_loc]
diag_cam_baseline_climo[cam_case_name]
diag_cam_baseline_climo[cam_climo_loc]
diag_basic_info[cam_diag_plot_loc]

When I first wrote the code that did this, I could look in the locals() namespace, but now these seem to be buried in nested structures. Is there some class that is floating around in there where these are all still just simple attributes? Or is there a better way to get this info?

Answered by nusbaume

Nov 18, 2021

@andrewgettelman @brianpm I have made a working example of what this object-passing interface for a script would look like for the regrid_example.py script. Specifically, compare the version on glade here (with the ADF object interface):

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_fork/scripts/regridding/regrid_example_new.py

with the original version here:

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_fork/scripts/regridding/regrid_example.py

Obviously the new version requires extra lines to extract the necessary variables, but the actual function interface is much simplified (as well as the code in adf_diag.py). Plus this method gives the script access to the ADF deb…

View full answer

nusbaume · 2021-11-15T19:19:41Z

nusbaume
Nov 15, 2021
Maintainer

Hi @brianpm

I am not sure where exactly in the calling order you are, but if you have access to the AdfDiag object itself then you can access any YAML variable using the read_config_var method. For the example above of getting the diag_cam_climo[cam_case_name] variable, you could do the following (assuming self is the AdfDiag object):


diag_climo_dict = self.read_config_var('diag_cam_climo`)

cam_case_name = self.read_config_var(`cam_case_name`, conf_dict=diag_climo_dict)

Also note that if you are inside an AdfDiag function then you can skip the first line and instead use the following internal variables for the conf_dict argument:

self.__basic_info
self.__cam_climo_info
self.__cam_bl_climo_info

which represent the diag_basic_info, diag_cam_climo, and diag_cam_baseline_climo dictionaries, respectively. You can see many examples of this in the adf_diag.py source file, if you are curious.

Finally, as an FYI, the reason this read_config_var method exists is because it automatically manages missing YAML variables without a Key Error. Thus if you request a variable that doesn't actually exist in the YAML file then it will simply return None, unless the optional required=True argument is passed, in which case the diagnostics will fail at that point with an error stating that the variable is missing.

Hope that helps, and let me know if you run into any problems!

1 reply

brianpm Nov 16, 2021
Collaborator Author

Thanks @nusbaume ... this does make sense, but I'm not sure if I'm totally getting it on a practical level.

The problem I'm hitting right now is that I'm supplying a plotting function like:

    - {plot_log_u_t: {'args':['cam_case_name', 'model_rgrid_loc', 'data_name', 'data_loc', 'self.__plot_loc'], 'module':marshian_zonal_plot}}

This isn't working because in __diag_scripts_caller I'm not able to use the locals() dictionary, so all the arguments end up being flagged: print("{} is not available".format(variable_to_check)). I tried to replace the names with the read_config_var calls, but that didn't seem to work.

I did get it to work when I modifed __diag_scripts_caller like this:

            if has_opt:
                if 'args' in opt:
                    # RULES: it has to be a list of strings,
                    #        and then we will take whatever of those are in locals
                    assert isinstance(opt['args'], list), "Function arguments must be of type list."
                    emsg = "Function argument list elements must be of type string."
                    assert all(isinstance(item, str) for item in opt['args']), emsg
                    func_args = list()  # start over
                    for variable_to_check in opt['args']:
                        if variable_to_check in locals():
                            func_args.append(locals()[variable_to_check])
                        else:
                            # try to use AdfDiag:
                            if "/" in variable_to_check:
                                parts = variable_to_check.split("/")
                                config_dict_specified = self.read_config_var(parts[0])
                                config_entry_specified = self.read_config_var(parts[1], conf_dict=config_dict_specified)
                            else:
                                config_entry_specified = self.read_config_var(variable_to_check)
                            if config_entry_specified is not None:
                                func_args.append(config_entry_specified)
                            else:
                                print("{} is not available".format(variable_to_check))

In doing this, I had to make up some convention for how to get into the dictionaries, and I just used a slash. No thinking went into this beyond: is it easy to parse. Here is what the yaml line becomes:

plotting_scripts:
    - {plot_log_u_t: {'args':['diag_cam_climo/cam_case_name', 'diag_cam_climo/cam_climo_loc', 'diag_cam_baseline_climo/cam_case_name', 'diag_cam_baseline_climo/cam_climo_loc', 'diag_basic_info/cam_diag_plot_loc'], 'module':marshian_zonal_plot}}

This appears to work, which is good. Was this the right way to do it? Should we change __diag_scripts_caller to allow this kind of argument parsing?

I guess this reduces to:

We want to allow users to specify their own functions, but I think the method for providing non-default arguments is not clear
Is the way I just modified adf_diag.py the "right" way to fix this? Maybe it's backward, and we should have users use some Adf-provided functions in their scripts to get their arguments instead of specifying them in the YAML file?

But I could be totally missing the target. If so, let me know!!

nusbaume · 2021-11-16T20:07:18Z

nusbaume
Nov 16, 2021
Maintainer

Hi @brianpm ,

Ahh, now I understand what your issue is. I think your fix is fine, and your code modifications look good to me! The only thing I might change would be to see if the locals check is necessary at all now. I also might replace the / delimiter with possibly a colon (:) as it sorta matches the YAML syntax a little more closely, and likely will avoid a possible confusion with file paths.

In terms of a long-term plan to provide arguments to various plotting scripts, I have wondered if instead of passing a bunch of arguments we just passed the ADF object itself (along with optional kwargs that are plotting script specific), and then it would be up to the plotting scripts themselves to use read_config_var to extract the info that they want. Obviously this would make developing the plotting scripts more difficult, but it would significantly clean-up a lot of the adf_diag.py code in __diag_scripts_caller and elsewhere.

As someone who has been on the plotting-scripts side what do you think of this? Would it help or hinder your scripts development?

0 replies

brianpm · 2021-11-16T20:13:25Z

brianpm
Nov 16, 2021
Collaborator Author

I agree on both. The locals check is probably not needed, and using a : would probably be better here.

After working through this example, I also started thinking that giving users access to the ADF object might be the better way to go. As it stands now, the user is likely to have to write an "interface" function that calls their plotting function. Having a clean way to get the information from ADF might make this easier overall. It could make it a little harder for beginners, but this might save work in the long run because that ADF object could potentially bring a lot of useful methods with it (working with catalogs, etc.).

0 replies

andrewgettelman · 2021-11-16T20:58:59Z

andrewgettelman
Nov 16, 2021
Collaborator

I'm getting a bit concerned watching this thread and looking at the code that this is steep learning curve for developers: I couldn't find the actual plotting code when I looked, and most of the discussion is well beyond my python knowledge. I recognize there are new standards here, but we need to consider how user-modifiable and user-developable we want this to be. And as a possible developer I don't even see where the plotting calls actually are in the scripts. Are we setting this bar too high for developers? How can we make it easier?

2 replies

brianpm Nov 16, 2021
Collaborator Author

Hi @andrewgettelman -- I think this is exactly the question we are asking. From the point of view of an end user who wants to put in their own script, what is the easiest way to let them do it? Right now I think we are pretty close to having a good solution, which is that the user puts their script in the scripts/plotting directory, and adds that script to the plotting scripts section of their YAML file. Now we need to figure out how the user needs to modify their script to be able to access the information that ADF has, either from the YAML file or possibly generated along the way.

I think what @nusbaume is suggesting is that it may be possible to let the users's script have access to an "ADF object" that can then go and get whatever info is needed. Then the user might be able to have a small function in their script that gets called from ADF that has access to the ADF info and then calls the user's plotting script. That little function might look something like:

def user_adf_script(ADFObj):
    case1_climo_file = ADFObj.get_climo_file(diag_cam_climo)
    case2_climo_file = ADFObj.get_climo_file(diag_cam_baseline_climo)
    case1_dataset = xr.open_dataset(case1_climo_file)
    case2_dataset = xr.open_dataset(case2_climo_file)
    user_plot_function(case1_dataset, case2_dataset)

Jesse can correct me if this is not what he's thinking. I think this might be easier for users, but requires that ADF object to be well documented to let users know what they can get from it.

andrewgettelman Nov 16, 2021
Collaborator

Objects in python are self documenting right? I.e., can we query the object to find this info (I'm thinking in terms of doing it in an IDE like jupyter lab). It's also a way to get jupyter integration: if you can run the script to load the object in jupyter, then you can do interactive development of a plot.....

Also, are we thinking what happens if we are multi-case (i.e., everything would be a list)?

nusbaume · 2021-11-17T18:22:09Z

nusbaume
Nov 17, 2021
Maintainer

@brianpm yes, that is exactly what I was thinking! I am not sure what the specific ADF object functions will be yet (it could take a couple of iterations with DTF/AMP to get it set-up in away that makes everyone happy), but the general idea shouldn't change.

Since I need to modify some of the plotting scripts anyway in order to enable Intake-ESM and the observations catalog, I can see about implementing a test version of this object-passing interface there.

@andrewgettelman part of the underlying design for the ADF is to allow you to call sections of the ADF in python/Jupyter without having to run the entire ADF itself, with the goal being exactly what you outlined (loading ADF to get the object to run a plotting script, but avoiding all of the re-gridding, time series generation, etc.).

The one downside to python as a language is that there is no easy way to query the object in python itself to know which functions are available in the object. To alleviate this, I hope to have only a few generic functions where you can query any variable you want, and then to just provide good documentation on the Github wiki or elsewhere describing how to use it. I suspect for most plotting scripts the requested variables will generally be the same, so hopefully the learning curve will be pretty shallow.

0 replies

nusbaume · 2021-11-18T19:05:39Z

nusbaume
Nov 18, 2021
Maintainer

@andrewgettelman @brianpm I have made a working example of what this object-passing interface for a script would look like for the regrid_example.py script. Specifically, compare the version on glade here (with the ADF object interface):

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_fork/scripts/regridding/regrid_example_new.py

with the original version here:

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_fork/scripts/regridding/regrid_example.py

Obviously the new version requires extra lines to extract the necessary variables, but the actual function interface is much simplified (as well as the code in adf_diag.py). Plus this method gives the script access to the ADF debug logger (notice the print statements that were replaced with debug_log), which means one can add special print statements to their scripts that are useful for debugging, but that shouldn't usually be output in a normal ADF run.

Anyways, let me know if you have any strong thoughts or opinions. I should note that script-specific keyword arguments (**kwargs) will still be passed like before, as there really isn't a reason for the ADF to store that info.

Thanks!

2 replies

brianpm Nov 18, 2021
Collaborator Author

@nusbaume -- This looks really good to me. This is so much cleaner, especially for cases that try to do things that we haven't thought of yet. For example, if a script needs to derive a variable that doesn't make sense for the timeseries/climatology files, it can go back to the original data without too much hassle.

In terms of Andrew's question about looking inside the ADF object, what if a user were to do dir(adf)? It wouldn't be pretty, but it would give a list of all the methods and attributes of the object, at least?

brianpm Nov 30, 2021
Collaborator Author

Follow up: I tested the new method of using the ADF instance as the argument to my script. It works great.

We might want to add some additional convenience methods down the road... most obvious examples for me would be something like get_climo(case, variable) , get_timeseries(case, variable), etc. We can see what kinds of needs arise, I guess.

andrewgettelman · 2021-11-18T21:37:31Z

andrewgettelman
Nov 18, 2021
Collaborator

Hi Jesse,

I think that looks okay to me. Is there a way to query the adf object to see what's in it? I gather the principle is to pass all this information and let each diagnostic grab the files in their own way?

Just to note that this code seems to be hardcoded to do a baseline or a cam case. Shouldn't it be more general (any case, especially if we go multi-case).

Also Why does it care what case name it will get? How would it select a case name with the ADF object if there are multiple cases? That might require the old version and an interface.

I'd like to see everything in ADF be extensible to multiple cases since it's high on our development list.

0 replies

nusbaume · 2021-11-18T23:01:37Z

nusbaume
Nov 18, 2021
Maintainer

Hi @andrewgettelman ,

Right now the only variables that will be available are all contained within the config YAML file, plus a couple extra "convenience" variables that I will document in a wiki page that hopefully will describe how to interface a script with the ADF. Also all of the adf functions can behave like query functions, in that if one requests a variable and it isn't found then the function simply returns None, unless the required=True optional variable is also provided to the adf function (in which case the ADF run will die if the variable isn't found).

Finally, the ADF object maintains the variable type as contained within the YAML file. Thus if one converts cam_case_name from a string to a list in the config YAML file (which is what we will want to do for multi-case runs), then the adf function will simply pass the script that same list. So, at least from the ADF infrastructure side there is nothing really preventing multi-case runs now.

That being said, some of the scripts (like the re-gridding example I showed) are hard-coded such that they expect only one test case (hence why that code looks like it does). It shouldn't take too much work to modify the scripts to expect multiple cases, but it still has to be done.

0 replies

nusbaume · 2021-11-29T18:17:32Z

nusbaume
Nov 29, 2021
Maintainer

@brianpm in response to your original issue, the ADF has been updated such that now every script should have the ADF object passed to it, from which you can access any config file variable. Instructions for using the ADF object in a script can be found here:

https://github.com/NCAR/ADF/wiki/ADF-python-API

obviously that is a first pass at those instructions, so if there is any section that is confusing or needs to be worded better please let me know.

Finally, if this new interface works for you then it would be great if you could mark this particular post as "the answer" for this discussion thread. That way if another user looks at this discussion they'll more likely see the URL to the script interface wiki page shown above. Thanks!

1 reply

brianpm Nov 29, 2021
Collaborator Author

Thanks, @nusbaume !! This looks great. I'll try to test it and then mark this as the answer (bug me soon if I don't do it).

andrewgettelman · 2021-11-29T18:26:03Z

andrewgettelman
Nov 29, 2021
Collaborator

@nusbaume This is great. Thanks for doing this and for making the super-helpful wiki page. I very much like the part about how to run your diagnostic outside of the ADF (would be good to have a working example in a notebook, maybe even for the hackathon).

I was a bit concerned with the several 'Please Note' comments on the wiki: I think structure is fine, but can see some of those 'please notes' (about what scripts can call plotting, variables in the configuration file being buried under sub-heads) adding too much complexity. Just another plug for flat and simple if we can.

0 replies

andrewgettelman · 2021-12-07T22:17:47Z

andrewgettelman
Dec 7, 2021
Collaborator

@nusbaume and @cecilehannay , Where is a good place to put a list of variables we should have defaults for? Somewhere we can make a list and people can check it off. Maybe it's just a google sheet we can link to in an issue? Not sure if github discussions or the wiki would work.

We now have a functional way to modify variable defaults for ranges and contour intervals, easy to customize, but it would be great to have a list of variables we want to attack. I started with a few of them I will soon put into a PR (issue #87)

Ideas?

0 replies

nusbaume · 2021-12-09T00:21:24Z

nusbaume
Dec 9, 2021
Maintainer

@andrewgettelman I would probably vote to make a new Gihub discussion with a link to a google sheet where people can add the variables they want, and discuss it on the new discussion thread itself. Then once it looks like all of the relevant folks are happy with the variable list we can convert the discussion thread into an issue and start the actual coding.

Of course if you need any help with any of that just let me know!

0 replies

andrewgettelman · 2021-12-09T02:32:05Z

andrewgettelman
Dec 9, 2021
Collaborator

Done: discussion #90 . Hopefully people will see this.... I mentioned you and Cecile, I'll flag @juliecaron, @JulioTBacmeister and @bitterbark here as well so they may go to the discussion of variables and the document for listing priorities....

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User defined function arguments #65

{{title}}

Replies: 13 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

User defined function arguments #65

brianpm Nov 15, 2021 Collaborator

Replies: 13 comments · 6 replies

nusbaume Nov 15, 2021 Maintainer

brianpm Nov 16, 2021 Collaborator Author

nusbaume Nov 16, 2021 Maintainer

brianpm Nov 16, 2021 Collaborator Author

andrewgettelman Nov 16, 2021 Collaborator

brianpm Nov 16, 2021 Collaborator Author

andrewgettelman Nov 16, 2021 Collaborator

nusbaume Nov 17, 2021 Maintainer

nusbaume Nov 18, 2021 Maintainer

brianpm Nov 18, 2021 Collaborator Author

brianpm Nov 30, 2021 Collaborator Author

andrewgettelman Nov 18, 2021 Collaborator

nusbaume Nov 18, 2021 Maintainer

nusbaume Nov 29, 2021 Maintainer

brianpm Nov 29, 2021 Collaborator Author

andrewgettelman Nov 29, 2021 Collaborator

andrewgettelman Dec 7, 2021 Collaborator

nusbaume Dec 9, 2021 Maintainer

andrewgettelman Dec 9, 2021 Collaborator

brianpm
Nov 15, 2021
Collaborator

Replies: 13 comments 6 replies

nusbaume
Nov 15, 2021
Maintainer

brianpm Nov 16, 2021
Collaborator Author

nusbaume
Nov 16, 2021
Maintainer

brianpm
Nov 16, 2021
Collaborator Author

andrewgettelman
Nov 16, 2021
Collaborator

brianpm Nov 16, 2021
Collaborator Author

andrewgettelman Nov 16, 2021
Collaborator

nusbaume
Nov 17, 2021
Maintainer

nusbaume
Nov 18, 2021
Maintainer

brianpm Nov 18, 2021
Collaborator Author

brianpm Nov 30, 2021
Collaborator Author

andrewgettelman
Nov 18, 2021
Collaborator

nusbaume
Nov 18, 2021
Maintainer

nusbaume
Nov 29, 2021
Maintainer

brianpm Nov 29, 2021
Collaborator Author

andrewgettelman
Nov 29, 2021
Collaborator

andrewgettelman
Dec 7, 2021
Collaborator

nusbaume
Dec 9, 2021
Maintainer

andrewgettelman
Dec 9, 2021
Collaborator