-
Notifications
You must be signed in to change notification settings - Fork 28
Do scalar conserve interp gpu
The function do_scalar_conserve_interp_gpu remaps the field variables for each vertical level for each time point. Once the function is invoked from fregrid_gpu, arrays such as those for the output data are transferred to GPUs, and the function get_input_area_weight is called before interp_data_order1(2). Next, do_scalar_conserve_interp_gpu appropriately assigns MISSING_VALUE in the output data and compute the mean output data if cell_methods == mean. Data is then copied out to CPU and is written out to the output NetCDF file at the end of fregrid_gpu. Get_input_area_weight and interp_data_order1 are described below in detail.
The function get_input_area_weight stores the denominator component of the remapping weights where the weights, as explained in section Remapping, is the ratio of the associated exchange grid cell area to the input grid cell area. In addition, get_input_area_weight takes into account special considerations for when cell_methods = MEAN for the input data. To best conserve the total global sum or global mean of the field, the input data should be the total cell value. To obtain the total cell value, the input data is un-averaged by multiplying the field data with the cell area that was used to compute the mean (which may differ from the input grid cell areas that are computed by fregrid). These input cell areas are retrieved if cell_measure = area and the associated file is specified in the global attributes section of the input NetCDF data file. If cell_measure is not specified, the described special consideration is not taken into account. See FRE-NCTools documentation and cell_measures description available at cfconventions.org
The function interp_data_order1 for the first order scheme and interp_data_order2 for the second order scheme finally remaps the data following the formulas specified in the Exchange grid section. Input data is transferred to the GPU before executing the loop over each exchange grid cell. Output data is deleted on the GPU after copying out to CPU.
It should be noted, the OpenACC atomic update directive resolves the race condition resulting from more than one input grid cells overlapping with the output grid cell. If the resolution of the input grid is comparable to the output grid, only a small number of input cells overlap with each output cell. In such cases, thread synchronization does not involve a large number of threads and performance loss is minimal.