-
Notifications
You must be signed in to change notification settings - Fork 2
/
tutorial-aggregate.txt
174 lines (126 loc) · 7.2 KB
/
tutorial-aggregate.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
Explanation of how to use the aggregate.ncl script
For Bill to generate T1; also send to Ping Yang
I wrote an ncl script to perform aggregation of NARCCAP data. This
example shows how to use aggregate.ncl to condense 3-hourly data into
daily data. This is the process I used to generate tasmin & tasmax
from tas for the RCM3 runs when we discovered that the values
generated by RCM3 itself were no good. There are some fiddly details
relating to getting the aggregation period to match up with the
0600-0600 GMT "day" specified for NARCCAP Table 1 data, but the usage
to generate monthly & seasonal averages and climatologies is pretty
similar.
1) Concatenate input files together using the NCO command 'ncrcat'.
This is necessary because the file boundaries on the 5-year files may
not coincide exactly with the boundaries of the periods you're
aggregating over. If you aren't careful with the boundaries, you can
end up with a period at the edge of the range where the value for a
large period is based on just one or two timesteps. So you definitely
want to get this right.
For going from 3-hourly to daily, we also throw out the very first
timestep, which is at 0300 on the first day. If it's not excluded, it
either results in an extra day at the beginning or in a day with 9
timesteps contributing intead of 8, and either way it messses things
up.
> ncrcat -d time,1, [input files] [output file]
2) Aggregate data using NCL script
> ncl -Q -n aggregate.ncl infile=\"tas.nc\" outfile=\"tasmax.nc\" interval=\"day\" varname=\"tas\" method=\"max\" check=True offset=-0.25 taint=True outtime=\"start\"
We pass command-line arguments to the NCL script using variable
definition statements on the command line. For string-valued
variables, NCL needs the quote-marks, which means you need to escape
them with backslashes so the shell doesn't interpret them instead of
passing them on to NCL. You could hardwire these values in the script
if you needed to.
In addition to the required command-line input to define the names of
the input file, output file, name of the variable, and period of
aggregation, there are a number of different options you can give
aggregate.ncl to control is behavior. The options used here:
method: allowed values are "mean", "min", or "max". Determines what
function is used to aggregate over the period. Switch to "min" to
generate tasmin.
check: if True, prints a bunch of debugging information at the end so
you can double-check that the output really is what you think it is
and came where it was supposed to come from. Good practice to use
this and look at it the output afterwards. (I typically redirect it
to a file in a subdirectory named "check".)
offset: a shift to the time coordinate. Used to adjust when the day
starts when doing daily aggregations. Using -0.25 makes the day run
from 0600 GMT to 0600 GMT.
taint: if true, any missing_value timesteps in the input cause the
entire output to be missing also.
outtime: determines which point in the input interval should be used
as the time coordinate for the output.
There are also options to control where in the interval the output
time coordinate is set, making climatological averages across years,
and printing of progress indicators for large datasets that take a
long time to process.
3) Rename variables to reflect new contents
If we were averaging the variable, we'd probably want to leave it with
the same name, but since we're generating a maximum temperature
variable from an average temperature variable, we need to rename the
data variable accordingly.
> ncrename -v tas,tasmax tasmax.nc
4) Update metadata
The tas variable is in table 2, while tasmax is in table 1, so we need
to change the global attribute named "table". We also need to update
the long_name attribute of tasmax to reflect the new variable. And,
for a minimum or maximum value, we need to add an appropriate
cell_methods attribute. All of these updates can be done with a
single use of ncatted. Note that we use the -h flag to prevent
ncatted from adding a history entry for this operation because the
results of the action are plainly obvious in the metadata, and the
very long entries typical of editing metadata really clutter up the
history and make it hard to read.
> ncatted -h -a table_id,global,m,c,"Table 1" -a long_name,tasmax,m,c,"Maximum Daily Surface Air Temperature" -a cell_methods,tasmax,m,c,"time: maximum(interval: 1 days)" tasmax.nc
5) Split files back into 5-year chunks using ncks
For NARCCAP publication, we have everything split into 5-year chunks
to keep the file sizes below 2 GB. If NCO has been installed with
udunits support, we can subset the data along the time dimension using
dates, which is a big plus for understanding what happened to the data
later on. There's no good programmatic way to split the files
according to the NARCCAP spec, so we just specify all the start and
end dates by hand. For Table 1 data, we can leave the time of day
unspecified. This sets it to 00:00 hours, and since the coordinates
for daily values are at 06:00 hours, the bounds as specified below
will split things properly. (The situation would be more complicated
for splitting 3-hourly data.) Happily, going from Jan-01 to Jan-01
also lets you ignore differences in the calendar.
The little shell loop does this for both tasmax and tasmin, and
propagates whatever other filename components may be in place.
NCEP data:
foreach f (tasm*.nc)
set g = `basename $f .nc`
ncks -O -d time,"1979-01-01","1981-01-01" $f ${g}_1979010106.nc
ncks -O -d time,"1981-01-01","1986-01-01" $f ${g}_1981010106.nc
ncks -O -d time,"1986-01-01","1991-01-01" $f ${g}_1986010106.nc
ncks -O -d time,"1991-01-01","1996-01-01" $f ${g}_1991010106.nc
ncks -O -d time,"1996-01-01","2001-01-01" $f ${g}_1996010106.nc
ncks -O -d time,"2001-01-01", $f ${g}_2001010106.nc
end
Current-period data:
foreach f (tasm*.nc)
set g = `basename $f .nc`
ncks -O -d time,"1968-01-01","1971-01-01" $f ${g}_1968010106.nc
ncks -O -d time,"1971-01-01","1976-01-01" $f ${g}_1971010106.nc
ncks -O -d time,"1976-01-01","1981-01-01" $f ${g}_1976010106.nc
ncks -O -d time,"1981-01-01","1986-01-01" $f ${g}_1981010106.nc
ncks -O -d time,"1986-01-01","1991-01-01" $f ${g}_1986010106.nc
ncks -O -d time,"1991-01-01","1996-01-01" $f ${g}_1991010106.nc
ncks -O -d time,"1996-01-01", $f ${g}_1996010106.nc
end
Future-period data:
foreach f (tasm*.nc)
set g = `basename $f .nc`
ncks -O -d time,"2038-01-01","2041-01-01" $f ${g}_2038010106.nc
ncks -O -d time,"2041-01-01","2046-01-01" $f ${g}_2041010106.nc
ncks -O -d time,"2046-01-01","2051-01-01" $f ${g}_2046010106.nc
ncks -O -d time,"2051-01-01","2056-01-01" $f ${g}_2051010106.nc
ncks -O -d time,"2056-01-01","2061-01-01" $f ${g}_2056010106.nc
ncks -O -d time,"2061-01-01","2066-01-01" $f ${g}_2061010106.nc
ncks -O -d time,"2066-01-01", $f ${g}_2066010106.nc
end
6) Double-check results
Always check that the end result makes sense. I wrote a little script
in NCL that uses the cd_calendar() function to print the date and time
of the first and last timestep in a file, and that plus the number of
timesteps in each file and the debugging output from the aggregate
script should indicate whether everything did what it was supposed to.