Skip to content

Commit 3877f63

Browse files
committed
initial draft
1 parent c8e986f commit 3877f63

File tree

3 files changed

+250
-0
lines changed

3 files changed

+250
-0
lines changed

README.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# CF Attributes Processor
2+
3+
This repository contains a Python class `CF_Attributes` for managing Climate and Forecast (CF) convention metadata attributes from a CSV file, serving them in different python dictionaries in specific categories, such as global attributes, variable attributes, and others. It also combines these attributes into a unified dictionary for use if necessary.
4+
5+
## Features
6+
7+
- **File Parsing**: Reads and processes a CSV file containing CF attributes into Python dictionaries.
8+
- **Attribute Categorisation**: Automatically categorises attributes into:
9+
- global attributes
10+
- coordinate variable attributes
11+
- data variable attributes
12+
- boundary variable attributes
13+
- geometry container variable attributes
14+
- quantization container variable attributes
15+
- all variable attributes
16+
- group Attributes
17+
18+
## Installation
19+
20+
1. Clone the repository:
21+
```bash
22+
git clone <repository-url>
23+
```
24+
2. Navigate to the repository:
25+
```bash
26+
cd <repository-folder>
27+
```
28+
3. Ensure Python 3.8 or higher is installed.
29+
30+
31+
## Usage
32+
33+
1. Update the file path to the CSV file containing CF attributes in the `CF_Attributes` class:
34+
```python
35+
self.file_path = '/path/to/cf_attributes.csv'
36+
```
37+
38+
2. Initialise the `CF_Attributes` class:
39+
```python
40+
from cf_attributes import CF_Attributes
41+
42+
cf_attributes = CF_Attributes()
43+
```
44+
45+
3. Access specific categories of attributes:
46+
```python
47+
cf_attributes.global_attributes
48+
cf_attributes.variable_attributes
49+
cf_attributes.data_variable_attributes
50+
cf_attributes.boundary_variable_attributes
51+
cf_attributes.geometry_container_variable_attributes
52+
cf_attributes.quantization_container_variable_attributes
53+
cf_attributes.group_attributes
54+
```
55+
56+
## CSV File
57+
58+
The CSV file is currently stored in this repository. However, if and when the CF conventions host this as a standalone CSV file, I will write something to use that as the source.
59+
60+
## Keys
61+
62+
Each attribute has the following keys:
63+
64+
- **Attribute**: The name of the attribute
65+
- **Type**: The type of the attribute:
66+
- `S`: String
67+
- `N`: Numeric
68+
- **Use**: A comma-separated list indicating where the attribute is used:
69+
- `G`: Global attributes
70+
- `C`: Coordinate variables
71+
- `D`: Data variables
72+
- `BI`: Boundary variables
73+
- `M`: Geometry container variables
74+
- `Q`: Quantization container variables
75+
- `Gr`: Group attributes
76+
- **Description**: A brief explanation of the attribute.
77+
78+
## Contributing
79+
80+
Contributions are welcome! Please follow these steps:
81+
82+
1. Fork the repository.
83+
2. Create a new branch for your feature or bug fix.
84+
3. Commit your changes.
85+
4. Push to your branch and create a pull request.
86+
87+
## Acknowledgements
88+
89+
- Climate and Forecast (CF) conventions for providing guidance on metadata standards. https://cfconventions.org/

cf_attributes.csv

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Attribute|Type|Use|Links|Description
2+
**`actual_range`**|N|C, D, BO|<<missing-data>>|The smallest and the largest valid non-missing values occurring in the variable.
3+
**`add_offset`**|N|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"], and <<packed-data>>|If present for a variable, this number is to be added to the data after it is read by an application. If both **`scale_factor`** and **`add_offset`** attributes are present, the data are first scaled before the offset is added. In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general.
4+
**`algorithm`**|S|Q|<<quantization-variables>>, and <<quantization-algorithms-description>>|Name of the quantization algorithm employed.
5+
**`ancillary_variables`**|S|D|<<ancillary-data>>|Identifies a variable that contains closely associated data, e.g., the measurement uncertainties of instrument data.
6+
**`axis`**|S|C, BI|<<coordinate-types>>|Identifies latitude, longitude, vertical, or time axes.
7+
**`bounds`**|S|C|<<cell-boundaries>>|Identifies a boundary variable.
8+
**`calendar`**|S|C, BI|<<calendar>>|Calendar used for encoding time axes.
9+
**`cell_measures`**|S|D, Do|<<cell-measures>>|Identifies variables that contain cell areas or volumes.
10+
**`cell_methods`**|S|D|<<cell-methods>>, <<climatological-statistics>>|Records the method used to derive data that represents cell values.
11+
**`cf_role`**|S|C, BI|<<coordinates-metadata>>|Identifies the roles of variables that identify features in discrete sampling geometries. Identifies the roles of mesh topology and location index set variables (see <<appendix-mesh-topology-attributes>>).
12+
**`climatology`**|S|C|<<climatological-statistics>>|Identifies a climatology variable.
13+
**`comment`**|S|G, C, D|<<description-of-file-contents>>|Miscellaneous information about the data or methods used to produce it.
14+
**`compress`**|S|C|<<compression-by-gathering>>, <<reduced-horizontal-grid>>|Records dimensions which have been compressed by gathering.
15+
**`computed_standard_name`**|S|C, BI|<<parametric-vertical-coordinate>>|Indicates the standard name, from the standard name table, of the computed vertical coordinate values, computed according to the formula in the definition.
16+
**`Conventions`**|S|G|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|Name of the conventions followed by the dataset.
17+
**`coordinate_interpolation`**|S|D, Do|<<compression-by-coordinate-subsampling>>|Indicates that coordinates have been compressed by sampling and identifies the tie point coordinate variables and their associated interpolation variables.
18+
**`coordinates`**|S|D, M, Do|<<coordinate-system>>, <<labels>>, <<alternative-coordinates>>|Identifies auxiliary coordinate variables, label variables, and alternate coordinate variables.
19+
**`dimensions`**|S|Do|<<domain-variables>>|Identifies the dimensions that define a domain variable.
20+
**`external_variables`**|S|G|<<external-variables>>, <<cell-measures>>|Identifies variables which are named by **`cell_measures`** attributes in the file but which are not present in the file.
21+
**`_FillValue`**|D|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"], and <<missing-data>>, and <<ch9-missing-data>>.|A value used to represent missing or undefined data. Allowed for auxiliary coordinate variables but not allowed for coordinate variables.
22+
**`featureType`**|S|G|<<featureType>>|Specifies the type of discrete sampling geometry to which the data in the scope of this attribute belongs, and implies that all data variables in the scope of this attribute contain collections of features of that type.
23+
**`flag_masks`**|D|D|<<flags>>|Provides a list of bit fields expressing Boolean or enumerated flags.
24+
**`flag_meanings`**|S|D|<<flags>>|Use in conjunction with **`flag_values`** to provide descriptive words or phrases for each flag value. If multi-word phrases are used to describe the flag values, then the words within a phrase should be connected with underscores.
25+
**`flag_values`**|D|D|<<flags>>|Provides a list of the flag values. Use in conjunction with **`flag_meanings`**.
26+
**`formula_terms`**|S|C, BO|<<parametric-vertical-coordinate>>|Identifies variables that correspond to the terms in a formula.
27+
**`geometry`**|S|C, D, Do|<<geometries>>|Identifies a variable that defines geometry.
28+
**`geometry_type`**|S|M|<<geometries>>|Indicates the type of geometry present.
29+
**`grid_mapping`**|S|D, M, Do|<<grid-mappings-and-projections>>|Identifies a variable that defines a grid mapping.
30+
**`history`**|S|G, Gr|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|List of the applications that have modified the original data.
31+
**`implementation`**|S|Q|<<quantization-variables>>, and <<quantization-algorithms-description>>|The name and version of the library or client software that performed the quantization with **`algorithm`**.
32+
**`instance_dimension`**|S|-|<<representations-features>>|An attribute which identifies an index variable and names the instance dimension to which it applies. The index variable indicates that the indexed ragged array representation is being used for a collection of features.
33+
**`institution`**|S|G, D|<<description-of-file-contents>>|Where the original data was produced.
34+
**`interior_ring`**|S|M|<<geometries>>|Identifies a variable that indicates if polygon parts are interior rings (i.e., holes) or not.
35+
**`leap_month`**|N|C, BI|<<calendar>>|Specifies which month is lengthened by a day in leap years for a user defined calendar.
36+
**`leap_year`**|N|C, BI|<<calendar>>|Provides an example of a leap year for a user defined calendar. It is assumed that all years that differ from this year by a multiple of four are also leap years.
37+
**`location`**|S|D, Do|<<mesh-topology-variables>>, and <<appendix-mesh-topology-attributes>>|Specifies the location type within the mesh topology at which the variable is defined.
38+
**`location_index_set`**|S|D, Do|<<mesh-topology-variables>>, and <<appendix-mesh-topology-attributes>>|Specifies a variable that defines the subset of locations of a mesh topology at which the variable is defined.
39+
**`long_name`**|S|C, D, Do, BI|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"], and <<long-name>>|A descriptive name that indicates a variable's content. This name is not standardized.
40+
**`mesh`**|S|D, Do|<<mesh-topology-variables>>, and <<appendix-mesh-topology-attributes>>|Specifies a variable that defines a mesh topology.
41+
**`missing_value`**|D|C, D, BO|<<missing-data>>, and <<ch9-missing-data>>|A value or values used to represent missing or undefined data. Allowed for auxiliary coordinate variables but not allowed for coordinate variables.
42+
**`month_lengths`**|N|C, BI|<<calendar>>|Specifies the length of each month in a non-leap year for a user defined calendar.
43+
**`node_coordinates`**|S|M|<<geometries>>|Identifies variables that contain geometry node coordinates.
44+
**`node_count`**|S|M|<<geometries>>|Identifies a variable indicating the count of nodes per geometry.
45+
**`nodes`**|S|C|<<geometries>>|Identifies a coordinate node variable.
46+
**`part_node_count`**|S|M|<<geometries>>|Identifies a variable providing the count of nodes per geometry part.
47+
**`positive`**|S|C, BI|<<COARDS>>|Direction of increasing vertical coordinate value.
48+
**`quantization`**|S|D|<<quantization-variables>>|Identifies a variable that defines a quantization algorithm and its provenance.
49+
**`quantization_nsb`**|N|D|<<per-variable-quantization-attributes>>, and <<quantization-algorithms-description>>|Specifies the number of significant bits retained in the IEEE mantissa of data quantized with the BitRound algorithm. Use in conjunction with **`quantization`**.
50+
**`quantization_nsd`**|N|D|<<per-variable-quantization-attributes>>, and <<quantization-algorithms-description>>|Specifies the number of significant base-10 digits retained in the IEEE mantissa of data quantized with base-10 quantization algorithms. Use in conjunction with **`quantization`**.
51+
**`references`**|S|G, D|<<description-of-file-contents>>|References that describe the data or methods used to produce it.
52+
**`sample_dimension`**|S|-|<<representations-features>>|An attribute which identifies a count variable and names the sample dimension to which it applies. The count variable indicates that the contiguous ragged array representation is being used for a collection of features.
53+
**`scale_factor`**|N|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"], and <<packed-data>>|If present for a variable, the data are to be multiplied by this factor after the data are read by an application. See also the **`add_offset`** attribute. In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general.
54+
**`source`**|S|G, D|<<description-of-file-contents>>|Method of production of the original data.
55+
**`standard_error_multiplier`**|N|D|<<standard-name-modifiers>>|If a data variable with a standard_name modifier of standard_error has this attribute, it indicates that the values are the stated multiple of one standard error.
56+
**`standard_name`**|S|C, D, BI|<<standard-name>>|A standard name that references a description of a variable"s content in the standard name table.
57+
**`title`**|S|G, Gr|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|Short description of the file contents.
58+
**`units`**|S|C, D, BI|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"], and <<units>>|Units of a variable's content.
59+
**`units_metadata`**|S|C, D, BI|<<units>>, and <<time-coordinate>>|Specifies the interpretation of a unit of measure appearing in the **`units`** attribute.
60+
**`valid_max`**|N|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|Largest valid value of a variable.
61+
**`valid_min`**|N|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|Smallest valid value of a variable.
62+
**`valid_range`**|N|C, D, BO|link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"]|Smallest and largest valid values of a variable.

process_cf_attributes.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
import csv
2+
3+
class CF_Attributes:
4+
5+
def __init__(self):
6+
self.file_path = 'cf_attributes.csv'
7+
self._to_dict()
8+
self._global_attributes()
9+
self._coordinate_variable_attributes()
10+
self._data_variable_attributes()
11+
self._boundary_variable_attributes()
12+
self._geometry_container_variable_attributes()
13+
self._quantization_container_variable_attributes()
14+
self._group_attributes
15+
self._variable_attributes()
16+
17+
def _to_dict(self):
18+
# Dictionary to store the result
19+
self.all_attributes = {}
20+
21+
# Read the CSV file and process
22+
with open(self.file_path, "r") as file:
23+
reader = csv.DictReader(file, delimiter="|")
24+
for row in reader:
25+
# Clean the Attribute column
26+
attribute = row["Attribute"].strip("**`")
27+
# Add to dictionary, removing unwanted characters
28+
self.all_attributes[attribute] = {
29+
"Type": row["Type"].strip(),
30+
"Use": row["Use"].strip().split(', '),
31+
"Description": row["Description"].strip()
32+
}
33+
34+
def _coordinate_variable_attributes(self):
35+
self.coordinate_variable_attributes = {}
36+
for key, val in self.all_attributes.items():
37+
if 'C' in self.all_attributes[key]['Use']:
38+
self.coordinate_variable_attributes[key] = val
39+
40+
def _data_variable_attributes(self):
41+
self.data_variable_attributes = {}
42+
for key, val in self.all_attributes.items():
43+
if 'D' in self.all_attributes[key]['Use']:
44+
self.data_variable_attributes[key] = val
45+
46+
def _global_attributes(self):
47+
self.global_attributes = {}
48+
for key, val in self.all_attributes.items():
49+
if 'G' in self.all_attributes[key]['Use']:
50+
self.global_attributes[key] = val
51+
52+
def _group_attributes(self):
53+
self.group_attributes = {}
54+
for key, val in self.all_attributes.items():
55+
if 'Gr' in self.all_attributes[key]['Use']:
56+
self.group_attributes[key] = val
57+
58+
def _geometry_container_variable_attributes(self):
59+
self.geometry_container_variable_attributes = {}
60+
for key, val in self.all_attributes.items():
61+
if 'M' in self.all_attributes[key]['Use']:
62+
self.geometry_container_variable_attributes[key] = val
63+
64+
def _boundary_variable_attributes(self):
65+
self.boundary_variable_attributes = {}
66+
for key, val in self.all_attributes.items():
67+
if 'BI' in self.all_attributes[key]['Use'] or 'BI' in self.all_attributes[key]['Use']:
68+
self.boundary_variable_attributes[key] = val
69+
70+
def _quantization_container_variable_attributes(self):
71+
self.quantization_container_variable_attributes = {}
72+
for key, val in self.all_attributes.items():
73+
if 'Q' in self.all_attributes[key]['Use']:
74+
self.quantization_container_variable_attributes[key] = val
75+
76+
def _variable_attributes(self):
77+
# Combine all variable-related dictionaries
78+
self.variable_attributes = {}
79+
variable_dictionaries = [
80+
self.coordinate_variable_attributes,
81+
self.data_variable_attributes,
82+
self.boundary_variable_attributes,
83+
self.geometry_container_variable_attributes,
84+
self.quantization_container_variable_attributes
85+
]
86+
87+
for var_dict in variable_dictionaries:
88+
for key, value in var_dict.items():
89+
if key not in self.variable_attributes:
90+
self.variable_attributes[key] = value
91+
92+
def main():
93+
# Create an instance of CF_Attributes
94+
cf_attributes = CF_Attributes()
95+
# Call the display method
96+
cf_attributes.display_global_attributes()
97+
98+
if __name__ == "__main__":
99+
main()

0 commit comments

Comments
 (0)