Skip to content

Do scalar conserve interp gpu

MiKyung Lee edited this page Dec 13, 2024 · 1 revision

The function do_scalar_conserve_interp_gpu remaps the field variables for each vertical level for each time point. Once the function is invoked from fregrid_gpu, arrays such as those for the output data are transferred to GPUs, and the function get_input_area_weight is called before interp_data_order1(2). Next, do_scalar_conserve_interp_gpu appropriately assigns MISSING_VALUE in the output data and compute the mean output data if cell_methods == mean. Data is then copied out to CPU and is written out to the output NetCDF file at the end of fregrid_gpu. Get_input_area_weight and interp_data_order1 are described below in detail.

$${\color{#fc6c85}\large{<><><>}}$$

The function get_input_area_weight stores the denominator component of the remapping weights where the weights, as explained in section Remapping, is the ratio of the associated exchange grid cell area to the input grid cell area. In addition, get_input_area_weight takes into account special considerations for when cell_methods = MEAN for the input data. To best conserve the total global sum or global mean of the field, the input data should be the total cell value. To obtain the total cell value, the input data is un-averaged by multiplying the field data with the cell area that was used to compute the mean (which may differ from the input grid cell areas that are computed by fregrid). These input cell areas are retrieved if cell_measure = area and the associated file is specified in the global attributes section of the input NetCDF data file. If cell_measure is not specified, the described special consideration is not taken into account. See FRE-NCTools documentation and cell_measures description available at cfconventions.org

$${\color{#fc6c85}\large{<><><>}}$$

The function interp_data_order1 for the first order scheme and interp_data_order2 for the second order scheme finally remaps the data following the formulas specified in the Exchange grid section. Input data is transferred to the GPU before executing the loop over each exchange grid cell. Output data is deleted on the GPU after copying out to CPU.

It should be noted, the OpenACC atomic update directive resolves the race condition resulting from more than one input grid cells overlapping with the output grid cell. If the resolution of the input grid is comparable to the output grid, only a small number of input cells overlap with each output cell. In such cases, thread synchronization does not involve a large number of threads and performance loss is minimal.

$${\color{#fc6c85}\large{<><><>}}$$

$$\color{#fc6c85}\mathrm{\large{Home}}$$
$$\color{#fc6c85}\mathrm{\large{Guide \space to \space Grid \space Coupling \space in \space FMS}}$$
$$\color{#fc6c85}\mathrm{\large{Generating \space Stretched \space Input \space Grids}}$$
$$\color{#fc6c85}\mathrm{\large{Fregrid\_gpu}}$$
Clone this wiki locally