Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit DIR_TRANS with OpenMP offload #211

Open
wants to merge 48 commits into
base: develop
Choose a base branch
from

Conversation

samhatfield
Copy link
Collaborator

No description provided.

samhatfield and others added 30 commits February 3, 2025 18:39
These variables should be inherited from the parent scope and shared
between all threads. This was a bug.
Otherwise this will not compile with CCE and -acc.
We already put this in LEINV, so why is it needed? No idea.
Co-authored-by: Paul Mullowney <Paul.Mullowney@amd.com>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
Co-authored-by: Thomas Gibson <Thomas.Gibson@amd.com>
@samhatfield samhatfield added enhancement New feature or request gpu labels Feb 10, 2025
@samhatfield
Copy link
Collaborator Author

I'm testing this PR with the following test program:

PROGRAM TEST_PROGRAM

USE PARKIND1, ONLY: JPIM, JPRB
USE MPL_MODULE

IMPLICIT NONE

! Spectral truncation
INTEGER(JPIM), PARAMETER :: TRUNC = 79
INTEGER(JPIM) :: verbosity = 0

! Arrays for storing our field in spectral space and grid point space
REAL(KIND=JPRB), ALLOCATABLE :: SPECTRAL_FIELD(:,:)
REAL(KIND=JPRB), ALLOCATABLE :: SPECTRAL_FIELD_2(:,:)
REAL(KIND=JPRB), ALLOCATABLE :: GRID_POINT_FIELD(:,:,:)

! Dimensions of our arrays in spectral space and grid point space
INTEGER(KIND=JPIM) :: NSPEC2
INTEGER(KIND=JPIM) :: NGPTOT

INTEGER(KIND=JPIM) :: SPECTRAL_INDICES(0:TRUNC)

#include "setup_trans0.h"
#include "setup_trans.h"
#include "trans_inq.h"
#include "inv_trans.h"
#include "dir_trans.h"
#include "trans_end.h"

CALL MPL_INIT(ldinfo=(verbosity>=1))

CALL DR_HOOK_INIT()

! Initialise ecTrans (resolution-agnostic aspects)
CALL SETUP_TRANS0(LDMPOFF=.TRUE., KPRINTLEV=VERBOSITY)

! Initialise ecTrans (resolution-specific aspects)
CALL SETUP_TRANS(KSMAX=TRUNC, KDGL=2 * (TRUNC + 1))

! Inquire about the dimensions in spectral space and grid point space
CALL TRANS_INQ(KSPEC2=NSPEC2, KGPTOT=NGPTOT, KASM0=SPECTRAL_INDICES)

! Allocate our work arrays
ALLOCATE(SPECTRAL_FIELD(1,NSPEC2))
ALLOCATE(SPECTRAL_FIELD_2(1,NSPEC2))
ALLOCATE(GRID_POINT_FIELD(NGPTOT,1,1))

! Initialise our spectral field arrays
SPECTRAL_FIELD(:,:) = 0.0_JPRB
SPECTRAL_FIELD(1,SPECTRAL_INDICES(3) + 2 * 5 + 1) = 1.0_JPRB

! Perform an inverse transform
CALL INV_TRANS(PSPSCALAR=SPECTRAL_FIELD, PGP=GRID_POINT_FIELD)

WRITE(6,*) "GRID_POINT_FIELD = ", MINVAL(GRID_POINT_FIELD), MAXVAL(GRID_POINT_FIELD)
FLUSH(6)

! Perform a direct transform
CALL DIR_TRANS(PGP=GRID_POINT_FIELD, PSPSCALAR=SPECTRAL_FIELD_2)

WRITE(6,*) "TEST_PROGRAM 1"
FLUSH(6)

! Compute error between before and after fields
WRITE(6,*) "Error = ", NORM2(SPECTRAL_FIELD_2 - SPECTRAL_FIELD)
FLUSH(6)

CALL TRANS_END

CALL MPL_END(ldmeminfo=.false.)

END PROGRAM TEST_PROGRAM

Currently it gives the following output on CPU:

ecTrans at version: 1.5.1
commit: 8aaf383ff7a073286b126f7aab3186e20d3916a1

 GRID_POINT_FIELD =  -3.66075754,  3.66075754
 TEST_PROGRAM 1
 Error =  8.930736328E-8

On GPU with OpenACC:


ecTrans at version: 1.5.1
commit: 8aaf383ff7a073286b126f7aab3186e20d3916a1

 R%NTMAX= 79
 R%NSMAX= 79
 setup_trans: sizes1 NUMP= 80
 Using OpenACC
    FG%ZAS:       611840B
    FG%ZAA:       611840B
   FG%ZAS0:        26240B
   FG%ZAA0:        25600B
 FG%ZEPSNM:        26240B
 ===GPU arrays successfully allocated
 GRID_POINT_FIELD =  -3.66075754,  3.66075754
 Error =  2.615770995E-7

On GPU with OpenMP:

ecTrans at version: 1.5.1
commit: 8aaf383ff7a073286b126f7aab3186e20d3916a1

 R%NTMAX= 79
 R%NSMAX= 79
 setup_trans: sizes1 NUMP= 80
 Using OpenMP offloading
    FG%ZAS:       611840B
    FG%ZAA:       611840B
   FG%ZAS0:        26240B
   FG%ZAA0:        25600B
 FG%ZEPSNM:        26240B
 ===GPU arrays successfully allocated
 GRID_POINT_FIELD =  -3.66075754,  3.66075754
 Error =  1.

@samhatfield samhatfield force-pushed the refresh_openmp_dir_trans branch from 8aaf383 to f9a2644 Compare February 10, 2025 16:59
@samhatfield
Copy link
Collaborator Author

samhatfield commented Feb 10, 2025

Bug found in TRLTOM_PACK_UNPACK. Output with OpenMP now (bit identical with OpenACC):

ecTrans at version: 1.5.1
commit: f9a26443f98d66bd0e55908e634b886c9c5d8d0d

 R%NTMAX= 79
 R%NSMAX= 79
 setup_trans: sizes1 NUMP= 80
 Using OpenMP offloading
    FG%ZAS:       611840B
    FG%ZAA:       611840B
   FG%ZAS0:        26240B
   FG%ZAA0:        25600B
 FG%ZEPSNM:        26240B
 ===GPU arrays successfully allocated
 GRID_POINT_FIELD =  -3.66075754,  3.66075754
 TEST_PROGRAM 1
 Error =  2.615770995E-7

This is needed so that device-side arrays are correctly deallocated at
the end (especially important for CCE builds).
The first test will take a while but the subsequent tests should be
significantly faster.
@samhatfield samhatfield force-pushed the refresh_openmp_dir_trans branch from 2f279fa to 20e6bd2 Compare February 12, 2025 13:53
Fix omp update directive in trgtol_mod (no MPI code path)
@samhatfield samhatfield force-pushed the refresh_openmp_dir_trans branch from f7f3551 to 6e0f184 Compare February 13, 2025 14:12
@samhatfield samhatfield mentioned this pull request Feb 13, 2025
26 tasks
@samhatfield samhatfield marked this pull request as ready for review February 13, 2025 16:37
@samhatfield samhatfield changed the title [DRAFT] Permit DIR_TRANS with OpenMP offload Permit DIR_TRANS with OpenMP offload Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants