Skip to content

Commit

Permalink
Per #3066, add Data-IO section about file lists and reference it in a…
Browse files Browse the repository at this point in the history
…ll tools that use them.
  • Loading branch information
JohnHalleyGotway committed Jan 28, 2025
1 parent 349600f commit e83f873
Show file tree
Hide file tree
Showing 13 changed files with 58 additions and 30 deletions.
8 changes: 5 additions & 3 deletions docs/Users_Guide/appendixF.rst
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,8 @@ The first argument for the Plot-Data-Plane tool is the gridded data file to be r
'grid': { ... } }
DEBUG 1: Creating postscript file: fcst.ps
.. _met-python-input-arg:

Special Case for Gen-Ens-Prod, Ensemble-Stat, Series-Analysis, and MTD
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -384,13 +386,13 @@ In the event the user requires command line arguments to their Python script, th
gen_ens_prod ens1.nc,arg1,arg2 ens2.nc,arg1,arg2 ens3.nc,arg1,arg2 ens4.nc,arg1,arg2 \
-out ens_prod.nc -config GenEnsProd_config
In this case, the user's Python script will receive "ens1.nc,arg1,arg2" as a single command line argument for each execution of the Python script (i.e. 1 time per file). The user must parse this argument inside their Python script to obtain **arg1** and **arg2** as separate arguments. The list of input files and optionally, any command line arguments can be written to a single file called **file_list** that is substituted for the file names and command line arguments. For example:
In this case, the user's Python script will receive "ens1.nc,arg1,arg2" as a single command line argument for each execution of the Python script (i.e. 1 time per file). The user must parse this argument inside their Python script to obtain **arg1** and **arg2** as separate arguments. The list of input files and optionally, any command line arguments can be written to a single file (called **python_input_list** in the example below) that is substituted for the file names and command line arguments. ASCII file list elements are white-space separated (space-separated in the example below), as described in :numref:`ascii_file_lists`. For example:

.. code-block::
:caption: Gen-Ens-Prod File List
echo "ens1.nc,arg1,arg2 ens2.nc,arg1,arg2 ens3.nc,arg1,arg2 ens4.nc,arg1,arg2" > file_list
gen_ens_prod file_list -out ens_prod.nc -config GenEnsProd_config
echo "file_list ens1.nc,arg1,arg2 ens2.nc,arg1,arg2 ens3.nc,arg1,arg2 ens4.nc,arg1,arg2" > python_input_list
gen_ens_prod python_input_list -out ens_prod.nc -config GenEnsProd_config
Finally, the above tools do not require data files to be present on a local disk. If the user wishes, their Python script can obtain data from other sources based upon only the command line arguments to their Python script. For example:

Expand Down
25 changes: 25 additions & 0 deletions docs/Users_Guide/data_io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,31 @@ Input point observation files in PrepBUFR format are available through NCEP. The

Tropical cyclone forecasts and observations are typically provided in a specific ATCF (Automated Tropical Cyclone Forecasting) ASCII format, in A-deck, B-deck, and E-deck files.

.. _ascii_file_lists:

ASCII File Lists
----------------

Several MET tools read multiple gridded input data files in a single run. For example, multiple gridded input files can define a time series of data for Series-Analysis and MTD or multiple ensemble members for Gen-Ens-Prod and Ensemble-Stat. Generally, users have two options for defining that list of input files:

1. Directly on the command line, for a relatively small number of files.
2. In a single ASCII file containing a list of file names of arbitrary length.

ASCII file lists consist of a white-space separated list of paths to the input files. While relative paths may work, users are encouraged to provide full paths for more consistent peformance. Note that environment variables can also be used in the file lists.

The file list elements can be separated by spaces, tabs, or newlines, but not commas. Users are encouraged to add the optional **file_list** keyword as the first element of each list to clearly identify it as such. When **file_list** is the first item, no validation logic is applied to the file names. When **file_list** is not present, MET checks whether each input file actually exists on the file system and errors out if it encounters too many missing input files.

While this validation logic is useful for standard input file formats, it can cause problems when providing a list of arguments for a Python embedding script since those arguments may not actually be the names of files on the file system. Please see the description of :ref:`MET_PYTHON_INPUT_ARG <met-python-input-arg>` for additional details.

Here is an example ASCII file list for three input files, each listed on a separate line:

.. code-block::
file_list
/path/to/file1
/path/to/file2
/path/to/file3
Requirements for CF Compliant NetCDF
------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/ensemble-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Required Arguments ensemble_stat

1. The **n_ens ens_file_1 ... ens_file_n** is the number of ensemble members followed by a list of ensemble member file names. This argument is not required when ensemble files are specified in the **ens_file_list**, detailed below.

2. The **ens_file_list** is an ASCII file containing a list of ensemble member file names. This is not required when a file list is included on the command line, as described above.
2. The **ens_file_list** is an ASCII file containing a list of ensemble member file names, as described in :numref:`ascii_file_lists`. This is not required when a file list is included on the command line, as described above.

3. The **config_file** is an **EnsembleStatConfig** file containing the desired configuration settings.

Expand Down
6 changes: 3 additions & 3 deletions docs/Users_Guide/gen-ens-prod.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The usage statement for the Ensemble Stat tool is shown below:
.. code-block:: none
Usage: gen_ens_prod
-ens file_1 ... file_n | ens_file_list
-ens file_1 ... file_n | file_list
-out file
-config file
[-ctrl file]
Expand All @@ -59,9 +59,9 @@ gen_ens_prod has three required arguments and accepts several optional ones.
Required Arguments gen_ens_prod
-------------------------------

1. The **-ens file_1 ... file_n** option specifies the ensemble member file names. This argument is not required when ensemble files are specified in the **ens_file_list**, detailed below.
1. The **-ens file_1 ... file_n** option specifies the ensemble member file names. This argument is not required when ensemble files are specified in the **file_list**, detailed below.

2. The **ens_file_list** option is an ASCII file containing a list of ensemble member file names. This is not required when a file list is included on the command line, as described above.
2. The **file_list** option is an ASCII file containing a list of ensemble member file names, as described in :numref:`ascii_file_lists`. This is not required when a file list is included on the command line, as described above.

3. The **-out file** option specifies the NetCDF output file name to be written.

Expand Down
4 changes: 2 additions & 2 deletions docs/Users_Guide/grid-diag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The following sections describe the usage statement, required arguments, and opt
.. code-block:: none
Usage: grid_diag
-data file_1 ... file_n | data_file_list
-data file_1 ... file_n | file_list
-out file
-config file
[-log file]
Expand All @@ -36,7 +36,7 @@ grid_diag has required arguments and can accept several optional arguments.
Required Arguments for grid_diag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-data file_1 ... file_n | data_file_list** options specify the gridded data files or an ASCII file containing a list of file names to be used.
1. The **-data file_1 ... file_n | file_list** options specify the gridded data files or an ASCII file containing a list of file names to be used, as described in :numref:`ascii_file_lists`.

When **-data** is used once, all fields are read from each input file. When used multiple times, it must match the number of fields to be processed.
In this case the first field in the config data field list is read from the files designated by the first **-data**, the second field in the field list is read from files designated by the second **-data**, and so forth. All files within each set must be of the same file type, but the file types of each set may differ.
Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/gsi-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ Required Arguments for gsidens2orank

1. The **ens_file_1 ... ens_file_n** argument is a list of ensemble binary GSI diagnostic files to be reformatted.

2. The **ens_file_list** argument is an ASCII file containing a list of ensemble GSI diagnostic files.
2. The **ens_file_list** argument is an ASCII file containing a list of ensemble GSI diagnostic files, as described in :numref:`ascii_file_lists`.

3. The **-out path** argument specifies the name of the output **.stat** file.

Expand Down
6 changes: 3 additions & 3 deletions docs/Users_Guide/mode-td.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,16 +208,16 @@ The MODE-TD tool has three required arguments and can accept several optional ar
Required Arguments for mtd
^^^^^^^^^^^^^^^^^^^^^^^^^^

1. **-fcst file\_list** gives a list of forecast 2D data files to be processed by MTD. The files should have equally-spaced intervals of valid time.
1. The **-fcst file_1 ... file_n | file_list** option specifies the gridded forecast files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`. The files should have equally-spaced intervals of valid time.

2. **-obs file\_list** gives a list of observation 2D data files to be processed by MTD. As with the {\cb -fcst} option, the files should have equally-spaced intervals of valid time. This valid time spacing should be the same as for the forecast files.
2. The **-obs file_1 ... file_n | file_list** option specifies the gridded observation files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`. The files should have equally-spaced intervals of valid time. This valid time spacing should be the same as for the forecast files.

3. **-config config\_file** gives the path to a local configuration file that is specific to this particular run of MTD. The default MTD configuration file will be read first, followed by this one. Thus, only configuration options that are different from the default settings need be specified. Options set in this file will override any corresponding options set in the default configuration file.

Optional Arguments for mtd
^^^^^^^^^^^^^^^^^^^^^^^^^^

4. **-single file\_list** command line option may be used instead of the **-fcst** and **-obs** command line options to define objects in a single field.
4. The **-single file_1 ... file_n | file_list** option may be used instead of the **-fcst** and **-obs** command line options to define objects in a single field.

.. note:: When the **-single** command line option is used, data specified in the **fcst** configuration file entry is read from those input files.

Expand Down
3 changes: 2 additions & 1 deletion docs/Users_Guide/pair-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ Required Arguments for pair_stat

1. The **-pairs** argument defines one or more input files containing forecast/observation pairs.
May be set as a list of file names (**file_1 ... file_n**) or as an ASCII file containing
a list file names (**file_list**). May be used multiple times (required).
a list file names (**file_list**), as described in :numref:`ascii_file_lists`.
May be used multiple times (required).

2. The **format type** argument defines the input pairs file format and may be set to
"mpr" or "ioda" (required).
Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/reformat_grid.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ The input files for the add, subtract, and derive command can be specified in on

2. Use **file_1 ... file_n** to specify the list of input files to be processed on the command line. Rather than specifying a separate configuration string for each input file, the "-field" command line option is required to specify the data to be processed.

3. Use **input_file_list** to specify the name of an ASCII file which contains the paths for the gridded data files to be processed. As in the previous option, the "-field" command line option is required to specify the data to be processed.
3. Use **input_file_list** to specify the name of an ASCII file which contains the paths for the gridded data files to be processed, as described in :numref:`ascii_file_lists`. As in the previous option, the "-field" command line option is required to specify the data to be processed.

An example of the pcp_combine calling sequence is presented below:

Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/rmw-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ _______________________
Required Arguments for rmw_analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-data file_1 ... file_n | data_file_list** argument is the NetCDF output of TC-RMW to be processed or an ASCII file containing a list of files.
1. The **-data file_1 ... file_n | data_file_list** argument is the NetCDF output of TC-RMW to be processed or an ASCII file containing a list of files, as described in :numref:`ascii_file_lists`.

2. The **-config file** argument is the **RMWAnalysisConfig** to be used. The contents of the configuration file are discussed below.

Expand Down
24 changes: 12 additions & 12 deletions docs/Users_Guide/series-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ The usage statement for the Series-Analysis tool is shown below:
.. code-block:: none
Usage: series_analysis
-fcst file_1 ... file_n | fcst_file_list
-obs file_1 ... file_n | obs_file_list
[-both file_1 ... file_n | both_file_list]
-fcst file_1 ... file_n | file_list
-obs file_1 ... file_n | file_list
[-both file_1 ... file_n | file_list]
[-aggr file]
[-paired]
-out file
Expand All @@ -46,9 +46,9 @@ series_analysis has four required arguments and accepts several optional ones.
Required Arguments series_stat
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-fcst file_1 ... file_n | fcst_file_list** options specify the gridded forecast files or ASCII files containing lists of file names to be used.
1. The **-fcst file_1 ... file_n | file_list** option specifies the gridded forecast files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`.

2. The **-obs file_1 ... file_n | obs_file_list** are the gridded observation files or ASCII files containing lists of file names to be used.
2. The **-obs file_1 ... file_n | file_list** option specifies the gridded observation files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`.

3. The **-out file** is the NetCDF output file containing computed statistics.

Expand All @@ -57,19 +57,19 @@ Required Arguments series_stat
Optional Arguments for series_analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

5. To set both the forecast and observations to the same set of files, use the optional -both file_1 ... file_n | both_file_list option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data.
5. To set both the forecast and observations to the same set of files, use the optional **-both file_1 ... file_n | file_list** option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data.

6. The -aggr option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data.
6. The **-aggr** option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data.

.. note:: When the -aggr option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. Runtimes are generally much slower when aggregating data since it requires many additional NetCDF variables containing the scalar partial sums and contingency table counts to be read and written.
.. note:: When the **-aggr** option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. Runtimes are generally much slower when aggregating data since it requires many additional NetCDF variables containing the scalar partial sums and contingency table counts to be read and written.

7. The -paired option indicates that the -fcst and -obs file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When -paired is not used, missing or incomplete files result in a runtime error with no output file being created. When -paired is used, missing or incomplete files result in a warning with output being created using the available data.
7. The **-paired** option indicates that the **-fcst** and **-obs** file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When **-paired** is not used, missing or incomplete files result in a runtime error with no output file being created. When **-paired** is used, missing or incomplete files result in a warning with output being created using the available data.

8. The -log file outputs log messages to the specified file.
8. The **-log** file outputs log messages to the specified file.

9. The -v level overrides the default level of logging (2).
9. The **-v** level overrides the default level of logging (2).

10. The -compress level option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
10. The **-compress** level option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.

An example of the series_analysis calling sequence is shown below:

Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/tc-diag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ tc_diag has required arguments and can accept several optional arguments.
Required Arguments for tc_diag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-data domain tech_id_list [ file_1 ... file_n | data_file_list ]** option specifies a domain name, a comma-separated list of ATCF tech ID's, and a list of gridded data files or an ASCII file containing a list of files to be used. Specify **-data** one for each gridded data source.
1. The **-data domain tech_id_list [ file_1 ... file_n | data_file_list ]** option specifies a domain name, a comma-separated list of ATCF tech ID's, and a list of gridded data files or an ASCII file containing a list of files to be used, as described in :numref:`ascii_file_lists`. Specify **-data** one for each gridded data source.

2. The **-deck source** option is the ATCF format track data source.

Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/tc-rmw.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ tc_rmw has required arguments and can accept several optional arguments.
Required Arguments for tc_rmw
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-data file_1 ... file_n | data_file_list** options specify the gridded data files or an ASCII file containing a list of files to be used.
1. The **-data file_1 ... file_n | data_file_list** options specify the gridded data files or an ASCII file containing a list of files to be used, as described in :numref:`ascii_file_lists`.

2. The **-deck source** argument is the ATCF format data source.

Expand Down

0 comments on commit e83f873

Please sign in to comment.