Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating Performance Regression in ADIOS BP for #400

Open
4 of 7 tasks
ax3l opened this issue Oct 16, 2023 · 17 comments
Open
4 of 7 tasks

Investigating Performance Regression in ADIOS BP for #400

ax3l opened this issue Oct 16, 2023 · 17 comments
Assignees
Labels

Comments

@ax3l
Copy link
Member

ax3l commented Oct 16, 2023

@RemiLehe reported and shared a HDF5/ADIOS2 data set with me that is 30x slower for BP than H5 in t.iterate(ts.get_a0, pol='y'). Also, it gets slower the more often it is called.

  • BP5 or BP4?
  • file-based encoding used for ... iterations
  • profile
  • test bpls -D ...: for decomposed data on disk
    • openPMD-pipe could be used to reorganize the data to be contiguous (in fact, its default)
    • we can also use to to make BP file-based series variable-based and HDF5 file-based ones group-based
  • Use lazy parsing / defer_iteration_parsing https://openpmd-api.readthedocs.io/en/0.15.2/details/backendconfig.html when the user specifies check_all_files=False

X-ref #380

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

BP Data:

HDF5 Data:

  • FBPIC (v0.25.0) RZ Data
  • HDF5

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

Benchmark: reads & processes field data

from openpmd_viewer.addons import LpaDiagnostics

ts = LpaDiagnostics('diag1/')
a0 = ts.iterate(ts.get_a0, pol='y') # ~1 or 50 iterations processed per second (BP or H5)

Warnings on open for BP series:

Warning: File 297 has different openPMD parameters than the rest of the time series.
Warning: File 298 has different openPMD parameters than the rest of the time series.
Warning: File 299 has different openPMD parameters than the rest of the time series.

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

The access pattern is quite simple: 2-4 reads on RZ fields (full read) per step, which take 95-97% of the time of this

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

Definitely on the openPMD-api side of things. CC @guj and @franzpoeschel for us to dig deeper:
benchmark.zip

from openpmd_viewer.addons import LpaDiagnostics
from openpmd_viewer.openpmd_timeseries.data_reader import io_reader
%load_ext line_profiler

BP4

ts_bp = LpaDiagnostics('diag1/')
Warning: File 297 has different openPMD parameters than the rest of the time series.
Warning: File 298 has different openPMD parameters than the rest of the time series.
Warning: File 299 has different openPMD parameters than the rest of the time series.
%lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data  \
    a0_bp = ts_bp.iterate(ts_bp.get_a0, pol='y')  # ~50 per second
100%|██████████| 299/299 [04:28<00:00,  1.11it/s]



Timer unit: 1e-09 s

Total time: 268.797 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py
Function: read_field_circ at line 124

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   124                                           def read_field_circ( series, iteration, field_name, component_name,
   125                                                                slice_relative_position, slice_across, m=0, theta=0.,
   126                                                                max_resolution_3d=None ):
   127                                               """
   128                                               Extract a given field from a file in the openPMD format,
   129                                               when the geometry is thetaMode
   130                                           
   131                                               Parameters
   132                                               ----------
   133                                               series: openpmd_api.Series
   134                                                   An open, readable openPMD-api series object
   135                                           
   136                                               iteration: integer
   137                                                   Iteration from which parameters should be extracted
   138                                           
   139                                               field_name : string, optional
   140                                                  Which field to extract
   141                                           
   142                                               component_name : string, optional
   143                                                  Which component of the field to extract
   144                                           
   145                                               m : int or string, optional
   146                                                  The azimuthal mode to be extracted
   147                                           
   148                                               theta : float or None
   149                                                  Angle of the plane of observation with respect to the x axis
   150                                                  If `theta` is not None, then this function returns a 2D array
   151                                                  corresponding to the plane of observation given by `theta` ;
   152                                                  otherwise it returns a full 3D Cartesian array
   153                                           
   154                                               slice_across : list of str or None
   155                                                  Direction(s) across which the data should be sliced
   156                                                  Elements can be 'r' and/or 'z'
   157                                                  Returned array is reduced by 1 dimension per slicing.
   158                                           
   159                                               slice_relative_position : list of float or None
   160                                                  Number(s) between -1 and 1 that indicate where to slice the data,
   161                                                  along the directions in `slice_across`
   162                                                  -1 : lower edge of the simulation box
   163                                                  0 : middle of the simulation box
   164                                                  1 : upper edge of the simulation box
   165                                           
   166                                               max_resolution_3d : list of int or None
   167                                                   Maximum resolution that the 3D reconstruction of the field (when
   168                                                   `theta` is None) can have. The list should contain two values,
   169                                                   e.g. `[200, 100]`, indicating the maximum longitudinal and transverse
   170                                                   resolution, respectively. This is useful for performance reasons,
   171                                                   particularly for 3D visualization.
   172                                           
   173                                               Returns
   174                                               -------
   175                                               A tuple with
   176                                                  F : a 3darray or 2darray containing the required field,
   177                                                      depending on whether `theta` is None or not
   178                                                  info : a FieldMetaInformation object
   179                                                  (contains information about the grid; see the corresponding docstring)
   180                                               """
   181      1200   13210083.0  11008.4      0.0      it = series.iterations[iteration]
   182                                           
   183                                               # Extract the dataset and and corresponding group
   184      1200    9517554.0   7931.3      0.0      field = it.meshes[field_name]
   185      1200    3207477.0   2672.9      0.0      if field.scalar:
   186                                                   component = next(field.items())[1]
   187                                               else:
   188      1200    4451544.0   3709.6      0.0          component = field[component_name]
   189                                           
   190                                               # Extract the metainformation
   191                                               #   FIXME here and in h5py reader, we need to invert the order on 'F' for
   192                                               #         grid spacing/offset/position
   193                                           
   194      1200    8484570.0   7070.5      0.0      coord_labels = {ii: coord for (ii, coord) in enumerate(field.axis_labels)}
   195      1200    3730629.0   3108.9      0.0      coord_label_str = ''.join(coord_labels.values())
   196      1200     734464.0    612.1      0.0      coord_label_str = 'm' + coord_label_str
   197      1200    3710349.0   3092.0      0.0      coord_order = RZorder[coord_label_str]
   198      1200     982238.0    818.5      0.0      if coord_order is RZorder.mrz:
   199                                                   Nm, Nr, Nz = component.shape
   200                                                   N_pair = (Nr, Nz)
   201      1200     653775.0    544.8      0.0      elif coord_order is RZorder.mzr:
   202      1200    2931781.0   2443.2      0.0          Nm, Nz, Nr = component.shape
   203      1200     430916.0    359.1      0.0          N_pair = (Nz, Nr)
   204                                               else:
   205                                                   raise Exception(order_error_msg)
   206                                           
   207                                               # Nm, Nr, Nz = component.shape
   208      2400  143492204.0  59788.4      0.1      info = FieldMetaInformation( coord_labels, N_pair,
   209      1200    7396198.0   6163.5      0.0          field.grid_spacing, field.grid_global_offset,
   210      1200    4925725.0   4104.8      0.0          field.grid_unit_SI, component.position, thetaMode=True )
   211                                           
   212                                               # Convert to a 3D Cartesian array if theta is None
   213      1200     740228.0    616.9      0.0      if theta is None:
   214                                           
   215                                                   # Get cylindrical info
   216                                                   rmax = info.rmax
   217                                                   inv_dr = 1./info.dr
   218                                                   Fcirc = get_data( series, component )  # (Extracts all modes)
   219                                                   if m == 'all':
   220                                                       modes = [ mode for mode in range(0, int(Nm / 2) + 1) ]
   221                                                   else:
   222                                                       modes = [ m ]
   223                                                   modes = np.array( modes, dtype='int' )
   224                                                   nmodes = len(modes)
   225                                           
   226                                                   # If necessary, reduce resolution of 3D reconstruction
   227                                                   if max_resolution_3d is not None:
   228                                                       max_res_lon, max_res_transv = max_resolution_3d
   229                                                       if Nz > max_res_lon:
   230                                                           # Calculate excess of elements along z
   231                                                           excess_z = int(np.round(Nz/max_res_lon))
   232                                                           # Preserve only one every excess_z elements
   233                                                           if coord_order is RZorder.mrz:
   234                                                               Fcirc = Fcirc[:, :, ::excess_z]
   235                                                           elif coord_order is RZorder.mzr:
   236                                                               Fcirc = Fcirc[:, ::excess_z, :]
   237                                                           else:
   238                                                               raise Exception(order_error_msg)
   239                                                           # Update info accordingly
   240                                                           info.z = info.z[::excess_z]
   241                                                           info.dz = info.z[1] - info.z[0]
   242                                                       if Nr > max_res_transv/2:
   243                                                           # Calculate excess of elements along r
   244                                                           excess_r = int(np.round(Nr/(max_res_transv/2)))
   245                                                           # Preserve only one every excess_r elements
   246                                                           if coord_order is RZorder.mrz:
   247                                                               Fcirc = Fcirc[:, ::excess_r, :]
   248                                                           elif coord_order is RZorder.mzr:
   249                                                               Fcirc = Fcirc[:, :, ::excess_r]
   250                                                           else:
   251                                                               raise Exception(order_error_msg)
   252                                                           # Update info and necessary parameters accordingly
   253                                                           info.r = info.r[::excess_r]
   254                                                           info.dr = info.r[1] - info.r[0]
   255                                                           inv_dr = 1./info.dr
   256                                                           # Update Nr after reducing radial resolution.
   257                                                           if coord_order is RZorder.mrz:
   258                                                               Nr = Fcirc.shape[1]
   259                                                           elif coord_order is RZorder.mzr:
   260                                                               Nr = Fcirc.shape[2]
   261                                                           else:
   262                                                               raise Exception(order_error_msg)
   263                                           
   264                                                   # Convert cylindrical data to Cartesian data
   265                                                   info._convert_cylindrical_to_3Dcartesian()
   266                                                   nx, ny, nz = len(info.x), len(info.y), len(info.z)
   267                                                   F_total = np.zeros( (nx, ny, nz) )
   268                                                   construct_3d_from_circ( F_total, Fcirc, info.x, info.y, modes,
   269                                                       nx, ny, nz, Nr, nmodes, inv_dr, rmax, coord_order)
   270                                           
   271                                               else:
   272                                           
   273                                                   # Extract the modes and recombine them properly
   274      1200    1020101.0    850.1      0.0          if coord_order is RZorder.mrz:
   275                                                       F_total = np.zeros( (2 * Nr, Nz ) )
   276      1200     563083.0    469.2      0.0          elif coord_order is RZorder.mzr:
   277      1200  127687513.0 106406.3      0.0              F_total = np.zeros( (Nz, 2 * Nr ) )
   278                                                   else:
   279                                                       raise Exception(order_error_msg)
   280      1200     600205.0    500.2      0.0          if m == 'all':
   281                                                       # Sum of all the modes
   282                                                       # - Prepare the multiplier arrays
   283      1200     480352.0    400.3      0.0              mult_above_axis = [1]
   284      1200     313548.0    261.3      0.0              mult_below_axis = [1]
   285      2400    2995535.0   1248.1      0.0              for mode in range(1, int(Nm / 2) + 1):
   286      1200    6754595.0   5628.8      0.0                  cos = np.cos( mode * theta )
   287      1200    3444587.0   2870.5      0.0                  sin = np.sin( mode * theta )
   288      1200     801329.0    667.8      0.0                  mult_above_axis += [cos, sin]
   289      1200    2684201.0   2236.8      0.0                  mult_below_axis += [ (-1) ** mode * cos, (-1) ** mode * sin ]
   290      1200    2742807.0   2285.7      0.0              mult_above_axis = np.array( mult_above_axis )
   291      1200    1751994.0   1460.0      0.0              mult_below_axis = np.array( mult_below_axis )
   292                                                       # - Sum the modes
   293      1200        3e+11    2e+08     98.7              F = get_data( series, component )  # (Extracts all modes)
   294      1200    3860942.0   3217.5      0.0              if coord_order is RZorder.mrz:
   295                                                           F_total[Nr:, :] = np.tensordot( mult_above_axis,
   296                                                                                           F, axes=(0, 0) )[:, :]
   297                                                           F_total[:Nr, :] = np.tensordot( mult_below_axis,
   298                                                                                           F, axes=(0, 0) )[::-1, :]
   299      1200     609494.0    507.9      0.0              elif coord_order is RZorder.mzr:
   300      3600 1540135643.0 427815.5      0.6                  F_total[:, Nr:] = np.tensordot( mult_above_axis,
   301      2400    1224560.0    510.2      0.0                                                  F, axes=(0, 0) )[:, :]
   302      3600 1408390398.0 391219.6      0.5                  F_total[:, :Nr] = np.tensordot( mult_below_axis,
   303      2400    1110211.0    462.6      0.0                                                  F, axes=(0, 0) )[:, ::-1]
   304                                                       else:
   305                                                           raise Exception(order_error_msg)
   306                                                   elif m == 0:
   307                                                       # Extract mode 0
   308                                                       F = get_data( series, component, 0, 0 )
   309                                                       if coord_order is RZorder.mrz:
   310                                                           F_total[Nr:, :] = F[:, :]
   311                                                           F_total[:Nr, :] = F[::-1, :]
   312                                                       elif coord_order is RZorder.mzr:
   313                                                           F_total[:, Nr:] = F[:, :]
   314                                                           F_total[:, :Nr] = F[:, ::-1]
   315                                                       else:
   316                                                           raise Exception(order_error_msg)
   317                                                   else:
   318                                                       # Extract higher mode
   319                                                       cos = np.cos( m * theta )
   320                                                       sin = np.sin( m * theta )
   321                                                       F_cos = get_data( series, component, 2 * m - 1, 0 )
   322                                                       F_sin = get_data( series, component, 2 * m, 0 )
   323                                                       F = cos * F_cos + sin * F_sin
   324                                                       if coord_order is RZorder.mrz:
   325                                                           F_total[Nr:, :] = F[:, :]
   326                                                           F_total[:Nr, :] = (-1) ** m * F[::-1, :]
   327                                                       elif coord_order is RZorder.mzr:
   328                                                           F_total[:, Nr:] = F[:, :]
   329                                                           F_total[:, :Nr] = (-1) ** m * F[:, ::-1]
   330                                                       else:
   331                                                           raise Exception(order_error_msg)
   332                                           
   333                                               # Perform slicing if needed
   334      1200     580574.0    483.8      0.0      if slice_across is not None:
   335                                                   # Slice field and clear metadata
   336      1200    6733740.0   5611.4      0.0          inverted_axes_dict = {info.axes[key]: key for key in info.axes.keys()}
   337      2400    2808429.0   1170.2      0.0          for count, slice_across_item in enumerate(slice_across):
   338      1200     471282.0    392.7      0.0              slicing_index = inverted_axes_dict[slice_across_item]
   339      1200    1063420.0    886.2      0.0              coord_array = getattr( info, slice_across_item )
   340                                                       # Number of cells along the slicing direction
   341      1200     863504.0    719.6      0.0              n_cells = len(coord_array)
   342                                                       # Index of the slice (prevent stepping out of the array)
   343      1200    2489168.0   2074.3      0.0              i_cell = int( 0.5 * (slice_relative_position[count] + 1.) * n_cells )
   344      1200    1481252.0   1234.4      0.0              i_cell = max( i_cell, 0 )
   345      1200    1024311.0    853.6      0.0              i_cell = min( i_cell, n_cells - 1)
   346      1200   36935450.0  30779.5      0.0              F_total = np.take( F_total, [i_cell], axis=slicing_index )
   347      1200    6405975.0   5338.3      0.0          F_total = np.squeeze(F_total)
   348                                                   # Remove the sliced labels from the FieldMetaInformation
   349      2400     938369.0    391.0      0.0          for slice_across_item in slice_across:
   350      1200   19354570.0  16128.8      0.0              info._remove_axis(slice_across_item)
   351                                           
   352      1200     328709.0    273.9      0.0      return F_total, info

Total time: 265.242 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py
Function: get_data at line 24

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    24                                           def get_data(series, record_component, i_slice=None, pos_slice=None,
    25                                                        output_type=None):
    26                                               """
    27                                               Extract the data from a (possibly constant) dataset
    28                                               Slice the data according to the parameters i_slice and pos_slice
    29                                           
    30                                               Parameters:
    31                                               -----------
    32                                               series: openpmd_api.Series
    33                                                   An open, readable openPMD-api series object
    34                                           
    35                                               record_component: an openPMD.Record_Component
    36                                           
    37                                               pos_slice: int or list of int, optional
    38                                                   Slice direction(s).
    39                                                   When None, no slicing is performed
    40                                           
    41                                               i_slice: int or list of int, optional
    42                                                  Indices of slices to be taken.
    43                                           
    44                                               output_type: a numpy type
    45                                                  The type to which the returned array should be converted
    46                                           
    47                                               Returns:
    48                                               --------
    49                                               An np.ndarray (non-constant dataset) or a single double (constant dataset)
    50                                               """
    51                                               # For back-compatibility: Convert pos_slice and i_slice to
    52                                               # single-element lists if they are not lists (e.g. float
    53                                               # and int respectively).
    54      1200     569220.0    474.4      0.0      if pos_slice is not None and not isinstance(pos_slice, list):
    55                                                   pos_slice = [pos_slice]
    56      1200     395204.0    329.3      0.0      if i_slice is not None and not isinstance(i_slice, list):
    57                                                   i_slice = [i_slice]
    58                                           
    59                                               # ADIOS2: Actual chunks, all other: one chunk
    60      1200 3389494714.0    3e+06      1.3      chunks = record_component.available_chunks()
    61                                           
    62                                               # read whole data set
    63      1200     811767.0    676.5      0.0      if pos_slice is None:
    64                                                   # mask invalid regions with NaN
    65                                                   #   note: full_like triggers a full read, thus we avoid it #340
    66      1200 1979930988.0    2e+06      0.7          data = np.full(record_component.shape, np.nan, record_component.dtype)
    67     30000   16305542.0    543.5      0.0          for chunk in chunks:
    68     28800  313135011.0  10872.7      0.1              chunk_slice = chunk_to_slice(chunk)
    69                                           
    70                                                       # skip empty slices
    71                                                       # https://github.com/ornladios/ADIOS2
    72     28800    8054948.0    279.7      0.0              volume = 1
    73    115200   28072097.0    243.7      0.0              for csl in chunk_slice:
    74     86400   40121819.0    464.4      0.0                  volume *= csl.stop - csl.start
    75     28800    9041033.0    313.9      0.0              if volume == 0:
    76                                                           continue
    77                                           
    78                                                       # read only valid region
    79     28800  276187183.0   9589.8      0.1              x = record_component[chunk_slice]
    80     28800        3e+11    9e+06     97.5              series.flush()
    81     28800  631541838.0  21928.5      0.2              data[chunk_slice] = x
    82                                               # slice: read only part of the data set
    83                                               else:
    84                                                   full_shape = record_component.shape
    85                                           
    86                                                   slice_shape = list(full_shape)      # copy
    87                                                   pos_slice_sorted = pos_slice.copy() # copy for in-place sort
    88                                                   pos_slice_sorted.sort(reverse=True)
    89                                                   for dir_index in pos_slice_sorted:  # remove indices in list
    90                                                       del slice_shape[dir_index]
    91                                           
    92                                                   # mask invalid regions with NaN
    93                                                   data = np.full(slice_shape, np.nan, dtype=record_component.dtype)
    94                                           
    95                                                   # build requested ND slice with respect to full data
    96                                                   s = []
    97                                                   for d in range(len(full_shape)):
    98                                                       if d in pos_slice:
    99                                                           s.append(i_slice[pos_slice.index(d)]) # one index in such directions
   100                                                       else: # all indices in other direction
   101                                                           s.append(slice(None, None, None))
   102                                                   s = tuple(s)
   103                                           
   104                                                   # now we check which chunks contribute to the slice
   105                                                   for chunk in chunks:
   106                                                       skip_this_chunk = False
   107                                                       s_valid = list(s)  # same as s but reduced to valid regions in chunk
   108                                                       s_target = []  # starts and stops in sliced array
   109                                                       chunk_slice = chunk_to_slice(chunk)
   110                                           
   111                                                       # skip empty slices
   112                                                       # https://github.com/ornladios/ADIOS2
   113                                                       volume = 1
   114                                                       for csl in chunk_slice:
   115                                                           volume *= csl.stop - csl.start
   116                                                       if volume == 0:
   117                                                           continue
   118                                           
   119                                                       # read only valid region
   120                                                       for d, slice_d in enumerate(s):
   121                                                           start = chunk_slice[d].start
   122                                                           stop = chunk_slice[d].stop
   123                                                           if isinstance(slice_d, int):
   124                                                               # Nothing to do for s_target (dimension sliced out)
   125                                                               # Nothing to do for s_valid (dimension index is set)
   126                                                               if slice_d < start or slice_d >= stop:
   127                                                                   # chunk not in slice line/plane
   128                                                                   skip_this_chunk = True
   129                                                           else:
   130                                                               if slice_d.start is None or slice_d.start < start:
   131                                                                   s_valid[d] = slice(start, s_valid[d].stop)
   132                                                               if slice_d.stop is None or slice_d.stop > stop:
   133                                                                   s_valid[d] = slice(s_valid[d].start, stop)
   134                                                               s_target.append(slice(start, stop))
   135                                           
   136                                                       s_valid = tuple(s_valid)
   137                                                       s_target = tuple(s_target)
   138                                           
   139                                                       # read
   140                                                       if not skip_this_chunk:
   141                                                           x = record_component[s_valid]
   142                                                           series.flush()
   143                                                           data[s_target] = x
   144                                           
   145                                               # Convert to the right type
   146      1200     641124.0    534.3      0.0      if (output_type is not None) and (data.dtype != output_type):
   147                                                   data = data.astype( output_type )
   148                                               # Scale by the conversion factor
   149      1200    5803104.0   4835.9      0.0      if record_component.unit_SI != 1.0:
   150                                                   if np.issubdtype(data.dtype, np.floating) or \
   151                                                       np.issubdtype(data.dtype, np.complexfloating):
   152                                                       data *= record_component.unit_SI
   153                                                   else:
   154                                                       data = data * record_component.unit_SI
   155                                           
   156      1200     222451.0    185.4      0.0      return data

HDF5

ts_h5 = LpaDiagnostics('lab_diags/hdf5/')
%lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data  \
    a0_h5 = ts_h5.iterate(ts_h5.get_a0, pol='y')  # ~50 per second
100%|██████████| 300/300 [00:41<00:00,  7.29it/s]



Timer unit: 1e-09 s

Total time: 40.4864 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py
Function: read_field_circ at line 124

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   124                                           def read_field_circ( series, iteration, field_name, component_name,
   125                                                                slice_relative_position, slice_across, m=0, theta=0.,
   126                                                                max_resolution_3d=None ):
   127                                               """
   128                                               Extract a given field from a file in the openPMD format,
   129                                               when the geometry is thetaMode
   130                                           
   131                                               Parameters
   132                                               ----------
   133                                               series: openpmd_api.Series
   134                                                   An open, readable openPMD-api series object
   135                                           
   136                                               iteration: integer
   137                                                   Iteration from which parameters should be extracted
   138                                           
   139                                               field_name : string, optional
   140                                                  Which field to extract
   141                                           
   142                                               component_name : string, optional
   143                                                  Which component of the field to extract
   144                                           
   145                                               m : int or string, optional
   146                                                  The azimuthal mode to be extracted
   147                                           
   148                                               theta : float or None
   149                                                  Angle of the plane of observation with respect to the x axis
   150                                                  If `theta` is not None, then this function returns a 2D array
   151                                                  corresponding to the plane of observation given by `theta` ;
   152                                                  otherwise it returns a full 3D Cartesian array
   153                                           
   154                                               slice_across : list of str or None
   155                                                  Direction(s) across which the data should be sliced
   156                                                  Elements can be 'r' and/or 'z'
   157                                                  Returned array is reduced by 1 dimension per slicing.
   158                                           
   159                                               slice_relative_position : list of float or None
   160                                                  Number(s) between -1 and 1 that indicate where to slice the data,
   161                                                  along the directions in `slice_across`
   162                                                  -1 : lower edge of the simulation box
   163                                                  0 : middle of the simulation box
   164                                                  1 : upper edge of the simulation box
   165                                           
   166                                               max_resolution_3d : list of int or None
   167                                                   Maximum resolution that the 3D reconstruction of the field (when
   168                                                   `theta` is None) can have. The list should contain two values,
   169                                                   e.g. `[200, 100]`, indicating the maximum longitudinal and transverse
   170                                                   resolution, respectively. This is useful for performance reasons,
   171                                                   particularly for 3D visualization.
   172                                           
   173                                               Returns
   174                                               -------
   175                                               A tuple with
   176                                                  F : a 3darray or 2darray containing the required field,
   177                                                      depending on whether `theta` is None or not
   178                                                  info : a FieldMetaInformation object
   179                                                  (contains information about the grid; see the corresponding docstring)
   180                                               """
   181      1204   11854587.0   9846.0      0.0      it = series.iterations[iteration]
   182                                           
   183                                               # Extract the dataset and and corresponding group
   184      1204    7729223.0   6419.6      0.0      field = it.meshes[field_name]
   185      1204    1966414.0   1633.2      0.0      if field.scalar:
   186                                                   component = next(field.items())[1]
   187                                               else:
   188      1204    4037189.0   3353.1      0.0          component = field[component_name]
   189                                           
   190                                               # Extract the metainformation
   191                                               #   FIXME here and in h5py reader, we need to invert the order on 'F' for
   192                                               #         grid spacing/offset/position
   193                                           
   194      1204    7395178.0   6142.2      0.0      coord_labels = {ii: coord for (ii, coord) in enumerate(field.axis_labels)}
   195      1204    3127896.0   2597.9      0.0      coord_label_str = ''.join(coord_labels.values())
   196      1204     658084.0    546.6      0.0      coord_label_str = 'm' + coord_label_str
   197      1204    2862929.0   2377.8      0.0      coord_order = RZorder[coord_label_str]
   198      1204     796927.0    661.9      0.0      if coord_order is RZorder.mrz:
   199      1204    2759330.0   2291.8      0.0          Nm, Nr, Nz = component.shape
   200      1204     503943.0    418.6      0.0          N_pair = (Nr, Nz)
   201                                               elif coord_order is RZorder.mzr:
   202                                                   Nm, Nz, Nr = component.shape
   203                                                   N_pair = (Nz, Nr)
   204                                               else:
   205                                                   raise Exception(order_error_msg)
   206                                           
   207                                               # Nm, Nr, Nz = component.shape
   208      2408  133323900.0  55367.1      0.3      info = FieldMetaInformation( coord_labels, N_pair,
   209      1204    6074170.0   5045.0      0.0          field.grid_spacing, field.grid_global_offset,
   210      1204    4416856.0   3668.5      0.0          field.grid_unit_SI, component.position, thetaMode=True )
   211                                           
   212                                               # Convert to a 3D Cartesian array if theta is None
   213      1204     716774.0    595.3      0.0      if theta is None:
   214                                           
   215                                                   # Get cylindrical info
   216                                                   rmax = info.rmax
   217                                                   inv_dr = 1./info.dr
   218                                                   Fcirc = get_data( series, component )  # (Extracts all modes)
   219                                                   if m == 'all':
   220                                                       modes = [ mode for mode in range(0, int(Nm / 2) + 1) ]
   221                                                   else:
   222                                                       modes = [ m ]
   223                                                   modes = np.array( modes, dtype='int' )
   224                                                   nmodes = len(modes)
   225                                           
   226                                                   # If necessary, reduce resolution of 3D reconstruction
   227                                                   if max_resolution_3d is not None:
   228                                                       max_res_lon, max_res_transv = max_resolution_3d
   229                                                       if Nz > max_res_lon:
   230                                                           # Calculate excess of elements along z
   231                                                           excess_z = int(np.round(Nz/max_res_lon))
   232                                                           # Preserve only one every excess_z elements
   233                                                           if coord_order is RZorder.mrz:
   234                                                               Fcirc = Fcirc[:, :, ::excess_z]
   235                                                           elif coord_order is RZorder.mzr:
   236                                                               Fcirc = Fcirc[:, ::excess_z, :]
   237                                                           else:
   238                                                               raise Exception(order_error_msg)
   239                                                           # Update info accordingly
   240                                                           info.z = info.z[::excess_z]
   241                                                           info.dz = info.z[1] - info.z[0]
   242                                                       if Nr > max_res_transv/2:
   243                                                           # Calculate excess of elements along r
   244                                                           excess_r = int(np.round(Nr/(max_res_transv/2)))
   245                                                           # Preserve only one every excess_r elements
   246                                                           if coord_order is RZorder.mrz:
   247                                                               Fcirc = Fcirc[:, ::excess_r, :]
   248                                                           elif coord_order is RZorder.mzr:
   249                                                               Fcirc = Fcirc[:, :, ::excess_r]
   250                                                           else:
   251                                                               raise Exception(order_error_msg)
   252                                                           # Update info and necessary parameters accordingly
   253                                                           info.r = info.r[::excess_r]
   254                                                           info.dr = info.r[1] - info.r[0]
   255                                                           inv_dr = 1./info.dr
   256                                                           # Update Nr after reducing radial resolution.
   257                                                           if coord_order is RZorder.mrz:
   258                                                               Nr = Fcirc.shape[1]
   259                                                           elif coord_order is RZorder.mzr:
   260                                                               Nr = Fcirc.shape[2]
   261                                                           else:
   262                                                               raise Exception(order_error_msg)
   263                                           
   264                                                   # Convert cylindrical data to Cartesian data
   265                                                   info._convert_cylindrical_to_3Dcartesian()
   266                                                   nx, ny, nz = len(info.x), len(info.y), len(info.z)
   267                                                   F_total = np.zeros( (nx, ny, nz) )
   268                                                   construct_3d_from_circ( F_total, Fcirc, info.x, info.y, modes,
   269                                                       nx, ny, nz, Nr, nmodes, inv_dr, rmax, coord_order)
   270                                           
   271                                               else:
   272                                           
   273                                                   # Extract the modes and recombine them properly
   274      1204    1033542.0    858.4      0.0          if coord_order is RZorder.mrz:
   275      1204  116253591.0  96556.1      0.3              F_total = np.zeros( (2 * Nr, Nz ) )
   276                                                   elif coord_order is RZorder.mzr:
   277                                                       F_total = np.zeros( (Nz, 2 * Nr ) )
   278                                                   else:
   279                                                       raise Exception(order_error_msg)
   280      1204     625240.0    519.3      0.0          if m == 'all':
   281                                                       # Sum of all the modes
   282                                                       # - Prepare the multiplier arrays
   283      1204     501373.0    416.4      0.0              mult_above_axis = [1]
   284      1204     337724.0    280.5      0.0              mult_below_axis = [1]
   285      2408    2770012.0   1150.3      0.0              for mode in range(1, int(Nm / 2) + 1):
   286      1204    6213459.0   5160.7      0.0                  cos = np.cos( mode * theta )
   287      1204    2939528.0   2441.5      0.0                  sin = np.sin( mode * theta )
   288      1204     797075.0    662.0      0.0                  mult_above_axis += [cos, sin]
   289      1204    2419221.0   2009.3      0.0                  mult_below_axis += [ (-1) ** mode * cos, (-1) ** mode * sin ]
   290      1204    2621466.0   2177.3      0.0              mult_above_axis = np.array( mult_above_axis )
   291      1204    1843528.0   1531.2      0.0              mult_below_axis = np.array( mult_below_axis )
   292                                                       # - Sum the modes
   293      1204        4e+10    3e+07     94.0              F = get_data( series, component )  # (Extracts all modes)
   294      1204    2309270.0   1918.0      0.0              if coord_order is RZorder.mrz:
   295      3612 1238568246.0 342903.7      3.1                  F_total[Nr:, :] = np.tensordot( mult_above_axis,
   296      2408    1154492.0    479.4      0.0                                                  F, axes=(0, 0) )[:, :]
   297      3612  823631746.0 228026.5      2.0                  F_total[:Nr, :] = np.tensordot( mult_below_axis,
   298      2408    1160084.0    481.8      0.0                                                  F, axes=(0, 0) )[::-1, :]
   299                                                       elif coord_order is RZorder.mzr:
   300                                                           F_total[:, Nr:] = np.tensordot( mult_above_axis,
   301                                                                                           F, axes=(0, 0) )[:, :]
   302                                                           F_total[:, :Nr] = np.tensordot( mult_below_axis,
   303                                                                                           F, axes=(0, 0) )[:, ::-1]
   304                                                       else:
   305                                                           raise Exception(order_error_msg)
   306                                                   elif m == 0:
   307                                                       # Extract mode 0
   308                                                       F = get_data( series, component, 0, 0 )
   309                                                       if coord_order is RZorder.mrz:
   310                                                           F_total[Nr:, :] = F[:, :]
   311                                                           F_total[:Nr, :] = F[::-1, :]
   312                                                       elif coord_order is RZorder.mzr:
   313                                                           F_total[:, Nr:] = F[:, :]
   314                                                           F_total[:, :Nr] = F[:, ::-1]
   315                                                       else:
   316                                                           raise Exception(order_error_msg)
   317                                                   else:
   318                                                       # Extract higher mode
   319                                                       cos = np.cos( m * theta )
   320                                                       sin = np.sin( m * theta )
   321                                                       F_cos = get_data( series, component, 2 * m - 1, 0 )
   322                                                       F_sin = get_data( series, component, 2 * m, 0 )
   323                                                       F = cos * F_cos + sin * F_sin
   324                                                       if coord_order is RZorder.mrz:
   325                                                           F_total[Nr:, :] = F[:, :]
   326                                                           F_total[:Nr, :] = (-1) ** m * F[::-1, :]
   327                                                       elif coord_order is RZorder.mzr:
   328                                                           F_total[:, Nr:] = F[:, :]
   329                                                           F_total[:, :Nr] = (-1) ** m * F[:, ::-1]
   330                                                       else:
   331                                                           raise Exception(order_error_msg)
   332                                           
   333                                               # Perform slicing if needed
   334      1204     525321.0    436.3      0.0      if slice_across is not None:
   335                                                   # Slice field and clear metadata
   336      1204    5131929.0   4262.4      0.0          inverted_axes_dict = {info.axes[key]: key for key in info.axes.keys()}
   337      2408    1923462.0    798.8      0.0          for count, slice_across_item in enumerate(slice_across):
   338      1204     495464.0    411.5      0.0              slicing_index = inverted_axes_dict[slice_across_item]
   339      1204     782667.0    650.1      0.0              coord_array = getattr( info, slice_across_item )
   340                                                       # Number of cells along the slicing direction
   341      1204     630311.0    523.5      0.0              n_cells = len(coord_array)
   342                                                       # Index of the slice (prevent stepping out of the array)
   343      1204    1827510.0   1517.9      0.0              i_cell = int( 0.5 * (slice_relative_position[count] + 1.) * n_cells )
   344      1204    1229624.0   1021.3      0.0              i_cell = max( i_cell, 0 )
   345      1204     960371.0    797.7      0.0              i_cell = min( i_cell, n_cells - 1)
   346      1204   14635558.0  12155.8      0.0              F_total = np.take( F_total, [i_cell], axis=slicing_index )
   347      1204    4445069.0   3691.9      0.0          F_total = np.squeeze(F_total)
   348                                                   # Remove the sliced labels from the FieldMetaInformation
   349      2408     904917.0    375.8      0.0          for slice_across_item in slice_across:
   350      1204   15903423.0  13208.8      0.0              info._remove_axis(slice_across_item)
   351                                           
   352      1204     349136.0    290.0      0.0      return F_total, info

Total time: 38.0203 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py
Function: get_data at line 24

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    24                                           def get_data(series, record_component, i_slice=None, pos_slice=None,
    25                                                        output_type=None):
    26                                               """
    27                                               Extract the data from a (possibly constant) dataset
    28                                               Slice the data according to the parameters i_slice and pos_slice
    29                                           
    30                                               Parameters:
    31                                               -----------
    32                                               series: openpmd_api.Series
    33                                                   An open, readable openPMD-api series object
    34                                           
    35                                               record_component: an openPMD.Record_Component
    36                                           
    37                                               pos_slice: int or list of int, optional
    38                                                   Slice direction(s).
    39                                                   When None, no slicing is performed
    40                                           
    41                                               i_slice: int or list of int, optional
    42                                                  Indices of slices to be taken.
    43                                           
    44                                               output_type: a numpy type
    45                                                  The type to which the returned array should be converted
    46                                           
    47                                               Returns:
    48                                               --------
    49                                               An np.ndarray (non-constant dataset) or a single double (constant dataset)
    50                                               """
    51                                               # For back-compatibility: Convert pos_slice and i_slice to
    52                                               # single-element lists if they are not lists (e.g. float
    53                                               # and int respectively).
    54      1204     569632.0    473.1      0.0      if pos_slice is not None and not isinstance(pos_slice, list):
    55                                                   pos_slice = [pos_slice]
    56      1204     421564.0    350.1      0.0      if i_slice is not None and not isinstance(i_slice, list):
    57                                                   i_slice = [i_slice]
    58                                           
    59                                               # ADIOS2: Actual chunks, all other: one chunk
    60      1204        2e+10    2e+07     58.7      chunks = record_component.available_chunks()
    61                                           
    62                                               # read whole data set
    63      1204     929623.0    772.1      0.0      if pos_slice is None:
    64                                                   # mask invalid regions with NaN
    65                                                   #   note: full_like triggers a full read, thus we avoid it #340
    66      1204 1832455319.0    2e+06      4.8          data = np.full(record_component.shape, np.nan, record_component.dtype)
    67      2408    2300463.0    955.3      0.0          for chunk in chunks:
    68      1204   17648245.0  14658.0      0.0              chunk_slice = chunk_to_slice(chunk)
    69                                           
    70                                                       # skip empty slices
    71                                                       # https://github.com/ornladios/ADIOS2
    72      1204     397967.0    330.5      0.0              volume = 1
    73      4816    1636138.0    339.7      0.0              for csl in chunk_slice:
    74      3612    2673375.0    740.1      0.0                  volume *= csl.stop - csl.start
    75      1204     561651.0    466.5      0.0              if volume == 0:
    76                                                           continue
    77                                           
    78                                                       # read only valid region
    79      1204   44672184.0  37103.1      0.1              x = record_component[chunk_slice]
    80      1204        1e+10    1e+07     35.3              series.flush()
    81      1204  377283633.0 313358.5      1.0              data[chunk_slice] = x
    82                                               # slice: read only part of the data set
    83                                               else:
    84                                                   full_shape = record_component.shape
    85                                           
    86                                                   slice_shape = list(full_shape)      # copy
    87                                                   pos_slice_sorted = pos_slice.copy() # copy for in-place sort
    88                                                   pos_slice_sorted.sort(reverse=True)
    89                                                   for dir_index in pos_slice_sorted:  # remove indices in list
    90                                                       del slice_shape[dir_index]
    91                                           
    92                                                   # mask invalid regions with NaN
    93                                                   data = np.full(slice_shape, np.nan, dtype=record_component.dtype)
    94                                           
    95                                                   # build requested ND slice with respect to full data
    96                                                   s = []
    97                                                   for d in range(len(full_shape)):
    98                                                       if d in pos_slice:
    99                                                           s.append(i_slice[pos_slice.index(d)]) # one index in such directions
   100                                                       else: # all indices in other direction
   101                                                           s.append(slice(None, None, None))
   102                                                   s = tuple(s)
   103                                           
   104                                                   # now we check which chunks contribute to the slice
   105                                                   for chunk in chunks:
   106                                                       skip_this_chunk = False
   107                                                       s_valid = list(s)  # same as s but reduced to valid regions in chunk
   108                                                       s_target = []  # starts and stops in sliced array
   109                                                       chunk_slice = chunk_to_slice(chunk)
   110                                           
   111                                                       # skip empty slices
   112                                                       # https://github.com/ornladios/ADIOS2
   113                                                       volume = 1
   114                                                       for csl in chunk_slice:
   115                                                           volume *= csl.stop - csl.start
   116                                                       if volume == 0:
   117                                                           continue
   118                                           
   119                                                       # read only valid region
   120                                                       for d, slice_d in enumerate(s):
   121                                                           start = chunk_slice[d].start
   122                                                           stop = chunk_slice[d].stop
   123                                                           if isinstance(slice_d, int):
   124                                                               # Nothing to do for s_target (dimension sliced out)
   125                                                               # Nothing to do for s_valid (dimension index is set)
   126                                                               if slice_d < start or slice_d >= stop:
   127                                                                   # chunk not in slice line/plane
   128                                                                   skip_this_chunk = True
   129                                                           else:
   130                                                               if slice_d.start is None or slice_d.start < start:
   131                                                                   s_valid[d] = slice(start, s_valid[d].stop)
   132                                                               if slice_d.stop is None or slice_d.stop > stop:
   133                                                                   s_valid[d] = slice(s_valid[d].start, stop)
   134                                                               s_target.append(slice(start, stop))
   135                                           
   136                                                       s_valid = tuple(s_valid)
   137                                                       s_target = tuple(s_target)
   138                                           
   139                                                       # read
   140                                                       if not skip_this_chunk:
   141                                                           x = record_component[s_valid]
   142                                                           series.flush()
   143                                                           data[s_target] = x
   144                                           
   145                                               # Convert to the right type
   146      1204     639774.0    531.4      0.0      if (output_type is not None) and (data.dtype != output_type):
   147                                                   data = data.astype( output_type )
   148                                               # Scale by the conversion factor
   149      1204    5938598.0   4932.4      0.0      if record_component.unit_SI != 1.0:
   150                                                   if np.issubdtype(data.dtype, np.floating) or \
   151                                                       np.issubdtype(data.dtype, np.complexfloating):
   152                                                       data *= record_component.unit_SI
   153                                                   else:
   154                                                       data = data * record_component.unit_SI
   155                                           
   156      1204     331688.0    275.5      0.0      return data

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

BP4 Directly w/ openPMD-api

s = io.Series("diag1/openpmd_%T.bp", io.Access.read_only)
#io.list_series(s, True)
%%time
for k_i, i in s.iterations.items():
    print("Iteration: {0}".format(k_i))

    E = i.meshes["E"]
    E_r = E["r"]
    E_t = E["t"]
    E_z = E["z"]
    
    # emulate multiple reads
    for i in range(4):
        E_r_data = E_r[()]
        s.flush()
        E_t_data = E_t[()]
        s.flush()
        E_z_data = E_z[()]
        s.flush()
Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
Iteration: 72
Iteration: 73
Iteration: 74
Iteration: 75
Iteration: 76
Iteration: 77
Iteration: 78
Iteration: 79
Iteration: 80
Iteration: 81
Iteration: 82
Iteration: 83
Iteration: 84
Iteration: 85
Iteration: 86
Iteration: 87
Iteration: 88
Iteration: 89
Iteration: 90
Iteration: 91
Iteration: 92
Iteration: 93
Iteration: 94
Iteration: 95
Iteration: 96
Iteration: 97
Iteration: 98
Iteration: 99
Iteration: 100
Iteration: 101
Iteration: 102
Iteration: 103
Iteration: 104
Iteration: 105
Iteration: 106
Iteration: 107
Iteration: 108
Iteration: 109
Iteration: 110
Iteration: 111
Iteration: 112
Iteration: 113
Iteration: 114
Iteration: 115
Iteration: 116
Iteration: 117
Iteration: 118
Iteration: 119
Iteration: 120
Iteration: 121
Iteration: 122
Iteration: 123
Iteration: 124
Iteration: 125
Iteration: 126
Iteration: 127
Iteration: 128
Iteration: 129
Iteration: 130
Iteration: 131
Iteration: 132
Iteration: 133
Iteration: 134
Iteration: 135
Iteration: 136
Iteration: 137
Iteration: 138
Iteration: 139
Iteration: 140
Iteration: 141
Iteration: 142
Iteration: 143
Iteration: 144
Iteration: 145
Iteration: 146
Iteration: 147
Iteration: 148
Iteration: 149
Iteration: 150
Iteration: 151
Iteration: 152
Iteration: 153
Iteration: 154
Iteration: 155
Iteration: 156
Iteration: 157
Iteration: 158
Iteration: 159
Iteration: 160
Iteration: 161
Iteration: 162
Iteration: 163
Iteration: 164
Iteration: 165
Iteration: 166
Iteration: 167
Iteration: 168
Iteration: 169
Iteration: 170
Iteration: 171
Iteration: 172
Iteration: 173
Iteration: 174
Iteration: 175
Iteration: 176
Iteration: 177
Iteration: 178
Iteration: 179
Iteration: 180
Iteration: 181
Iteration: 182
Iteration: 183
Iteration: 184
Iteration: 185
Iteration: 186
Iteration: 187
Iteration: 188
Iteration: 189
Iteration: 190
Iteration: 191
Iteration: 192
Iteration: 193
Iteration: 194
Iteration: 195
Iteration: 196
Iteration: 197
Iteration: 198
Iteration: 199
Iteration: 200
Iteration: 201
Iteration: 202
Iteration: 203
Iteration: 204
Iteration: 205
Iteration: 206
Iteration: 207
Iteration: 208
Iteration: 209
Iteration: 210
Iteration: 211
Iteration: 212
Iteration: 213
Iteration: 214
Iteration: 215
Iteration: 216
Iteration: 217
Iteration: 218
Iteration: 219
Iteration: 220
Iteration: 221
Iteration: 222
Iteration: 223
Iteration: 224
Iteration: 225
Iteration: 226
Iteration: 227
Iteration: 228
Iteration: 229
Iteration: 230
Iteration: 231
Iteration: 232
Iteration: 233
Iteration: 234
Iteration: 235
Iteration: 236
Iteration: 237
Iteration: 238
Iteration: 239
Iteration: 240
Iteration: 241
Iteration: 242
Iteration: 243
Iteration: 244
Iteration: 245
Iteration: 246
Iteration: 247
Iteration: 248
Iteration: 249
Iteration: 250
Iteration: 251
Iteration: 252
Iteration: 253
Iteration: 254
Iteration: 255
Iteration: 256
Iteration: 257
Iteration: 258
Iteration: 259
Iteration: 260
Iteration: 261
Iteration: 262
Iteration: 263
Iteration: 264
Iteration: 265
Iteration: 266
Iteration: 267
Iteration: 268
Iteration: 269
Iteration: 270
Iteration: 271
Iteration: 272
Iteration: 273
Iteration: 274
Iteration: 275
Iteration: 276
Iteration: 277
Iteration: 278
Iteration: 279
Iteration: 280
Iteration: 281
Iteration: 282
Iteration: 283
Iteration: 284
Iteration: 285
Iteration: 286
Iteration: 287
Iteration: 288
Iteration: 289
Iteration: 290
Iteration: 291
Iteration: 292
Iteration: 293
Iteration: 294
Iteration: 295
Iteration: 296
Iteration: 297
Iteration: 298
Iteration: 299
CPU times: user 13.3 s, sys: 3.37 s, total: 16.6 s
Wall time: 1min 12s

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

HDF5 Directly w/ openPMD-api

s_h5 = io.Series("lab_diags/hdf5/data%T.h5", io.Access.read_only)
%%time
for k_i, i in s_h5.iterations.items():
    print("Iteration: {0}".format(k_i))

    E = i.meshes["E"]
    E_r = E["r"]
    E_t = E["t"]
    E_z = E["z"]
    
    # emulate multiple reads
    for i in range(4):
        E_r_data = E_r[()]
        s_h5.flush()
        E_t_data = E_t[()]
        s_h5.flush()
        E_z_data = E_z[()]
        s_h5.flush()
Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
Iteration: 72
Iteration: 73
Iteration: 74
Iteration: 75
Iteration: 76
Iteration: 77
Iteration: 78
Iteration: 79
Iteration: 80
Iteration: 81
Iteration: 82
Iteration: 83
Iteration: 84
Iteration: 85
Iteration: 86
Iteration: 87
Iteration: 88
Iteration: 89
Iteration: 90
Iteration: 91
Iteration: 92
Iteration: 93
Iteration: 94
Iteration: 95
Iteration: 96
Iteration: 97
Iteration: 98
Iteration: 99
Iteration: 100
Iteration: 101
Iteration: 102
Iteration: 103
Iteration: 104
Iteration: 105
Iteration: 106
Iteration: 107
Iteration: 108
Iteration: 109
Iteration: 110
Iteration: 111
Iteration: 112
Iteration: 113
Iteration: 114
Iteration: 115
Iteration: 116
Iteration: 117
Iteration: 118
Iteration: 119
Iteration: 120
Iteration: 121
Iteration: 122
Iteration: 123
Iteration: 124
Iteration: 125
Iteration: 126
Iteration: 127
Iteration: 128
Iteration: 129
Iteration: 130
Iteration: 131
Iteration: 132
Iteration: 133
Iteration: 134
Iteration: 135
Iteration: 136
Iteration: 137
Iteration: 138
Iteration: 139
Iteration: 140
Iteration: 141
Iteration: 142
Iteration: 143
Iteration: 144
Iteration: 145
Iteration: 146
Iteration: 147
Iteration: 148
Iteration: 149
Iteration: 150
Iteration: 151
Iteration: 152
Iteration: 153
Iteration: 154
Iteration: 155
Iteration: 156
Iteration: 157
Iteration: 158
Iteration: 159
Iteration: 160
Iteration: 161
Iteration: 162
Iteration: 163
Iteration: 164
Iteration: 165
Iteration: 166
Iteration: 167
Iteration: 168
Iteration: 169
Iteration: 170
Iteration: 171
Iteration: 172
Iteration: 173
Iteration: 174
Iteration: 175
Iteration: 176
Iteration: 177
Iteration: 178
Iteration: 179
Iteration: 180
Iteration: 181
Iteration: 182
Iteration: 183
Iteration: 184
Iteration: 185
Iteration: 186
Iteration: 187
Iteration: 188
Iteration: 189
Iteration: 190
Iteration: 191
Iteration: 192
Iteration: 193
Iteration: 194
Iteration: 195
Iteration: 196
Iteration: 197
Iteration: 198
Iteration: 199
Iteration: 200
Iteration: 201
Iteration: 202
Iteration: 203
Iteration: 204
Iteration: 205
Iteration: 206
Iteration: 207
Iteration: 208
Iteration: 209
Iteration: 210
Iteration: 211
Iteration: 212
Iteration: 213
Iteration: 214
Iteration: 215
Iteration: 216
Iteration: 217
Iteration: 218
Iteration: 219
Iteration: 220
Iteration: 221
Iteration: 222
Iteration: 223
Iteration: 224
Iteration: 225
Iteration: 226
Iteration: 227
Iteration: 228
Iteration: 229
Iteration: 230
Iteration: 231
Iteration: 232
Iteration: 233
Iteration: 234
Iteration: 235
Iteration: 236
Iteration: 237
Iteration: 238
Iteration: 239
Iteration: 240
Iteration: 241
Iteration: 242
Iteration: 243
Iteration: 244
Iteration: 245
Iteration: 246
Iteration: 247
Iteration: 248
Iteration: 249
Iteration: 250
Iteration: 251
Iteration: 252
Iteration: 253
Iteration: 254
Iteration: 255
Iteration: 256
Iteration: 257
Iteration: 258
Iteration: 259
Iteration: 260
Iteration: 261
Iteration: 262
Iteration: 263
Iteration: 264
Iteration: 265
Iteration: 266
Iteration: 267
Iteration: 268
Iteration: 269
Iteration: 270
Iteration: 271
Iteration: 272
Iteration: 273
Iteration: 274
Iteration: 275
Iteration: 276
Iteration: 277
Iteration: 278
Iteration: 279
Iteration: 280
Iteration: 281
Iteration: 282
Iteration: 283
Iteration: 284
Iteration: 285
Iteration: 286
Iteration: 287
Iteration: 288
Iteration: 289
Iteration: 290
Iteration: 291
Iteration: 292
Iteration: 293
Iteration: 294
Iteration: 295
Iteration: 296
Iteration: 297
Iteration: 298
Iteration: 299
Iteration: 300
CPU times: user 2.52 s, sys: 1.54 s, total: 4.06 s
Wall time: 21.9 s

@ax3l
Copy link
Member Author

ax3l commented Oct 24, 2023

Next: reorganize both HDF5 and ADIOS BP4 with openPMD-pipe to be contiguous (group-based) and read again.

@guj
Copy link

guj commented Oct 25, 2023

Thanks Axel. Is the file available somewhere?

@ax3l
Copy link
Member Author

ax3l commented Oct 25, 2023

@guj I can share the files on NERSC with you 👍

@franzpoeschel
Copy link
Contributor

Please consider also what I wrote here

My current suspicion is that the data was written from an application that uses load balancing. The increasing load times in ADIOS2 would then just "represent" the increased complexity in the data (would be confirmed by showing the output of bpls -D as described in that comment)

@ax3l
Copy link
Member Author

ax3l commented Oct 25, 2023

Thanks @franzpoeschel ! I posted the output of the decomposition a bit further up in a comment:
#400 (comment)

@franzpoeschel
Copy link
Contributor

Thanks @franzpoeschel ! I posted the output of the decomposition a bit further up in a comment: #400 (comment)

Ah, I didn't see that. Still need to go over your results in detail.
Does the number of blocks get larger from the first to the last iteration?

@franzpoeschel
Copy link
Contributor

    60      1204        2e+10    2e+07     58.7      chunks = record_component.available_chunks()

Is this really the HDF5 benchmark where you are measuring this? HDF5 uses a simple fallback implementation for this, while this same function in ADIO2 can have a serious performance impact.

@ax3l
Copy link
Member Author

ax3l commented Oct 26, 2023

Yes, above I measure the HDF5 files and the ADIOS2 files, sorted under the respective captions. Indeed, very large time spend in here for HDF5... Longer than flush even o.0

@ax3l
Copy link
Member Author

ax3l commented Oct 26, 2023

Does the number of blocks get larger from the first to the last iteration?

Good point, added a full overview here now.

Looking at the data that we benchmark here (RZ fields), the variable /data/*/fields/E/r applies. The number of blocks are 24 in this variable and stay 24 for all steps that I see in the series.

@franzpoeschel
Copy link
Contributor

I did some first test runs on Perlmutter. (Using for now openpmd-pipe shipped with openPMD-api, not yet your script). Two things that I noticed:

  • The performance at least on /global/cfs is very unpredictable. Often, calling the same script a second time will lead to a much better performance. The first invocation often got noticeably slower mid-run, once to the point of freezing. Can this whole issue be a caching effect?
  • The openPMD-api does not yet really optimize very well for data Series with a large number of iterations. I noticed a little performance bug: Around half the runtime is spent in checking if the openPMD hierarchy has been modified, even for Iterations that have been closed already.

I'll prepare a PR that fixes this performance bug, but this will only be useful for the openPMD-viewer if it actually uses Iteration::close(). However, this bug is unlikely to be the reason for your performance issues, as it would affect HDF5 and ADIOS2 indiscriminately.

How to do a benchmark of compiled (non-Python) software on Perlmutter

I used google-perftools for finding this performance bug. It might be helpful to see your results.

Installation

Fortunately relatively simple since Perlmutter has all dependencies loaded without needing any additional modules loaded. Gperftools consists of two components.

  1. Download and unpack a release from https://github.com/gperftools/gperftools/releases/. Use CMake to build and optionally install, e.g. mkdir build; cd build; cmake .. -DCMAKE_INSTALL_PREFIX="$HOME/.local"; make -j; make install;.
  2. For analyzing the raw profiles, you will also need pprof which is a Go program. Installation is a single command line call: go install github.com/google/pprof@latest. pprof is then installed under ~/go/bin/pprof --help.

Profiling an application

The Python script whose C++ portion I wanted to profile was:

openpmd-pipe --infile diag1/openpmd_%T.bp --outfile out.bp --outconfig 'adios2.engine.type = "nullcore"'

In order to profile this:

LD_PRELOAD=~/.local/lib64/libprofiler.so CPUPROFILE=raw.prof python `which openpmd-pipe` --infile diag1/openpmd_%T.bp --outfile out.bp --outconfig 'adios2.engine.type = "nullcore"'

This runs the application with a slight slowdown. Afterwards, a file named raw.prof is on the disk. This needs to be analyzed with pprof now. The most helpful visualization is via graphviz which I did not bother to install on Perlmutter. Instead, I used pprof to output a .dot file which I then viewed on my own computer with Xdot:

~/go/bin/pprof -tree `which python` raw.prof > prof.dot

Find below the example graph which shows some Python internals, but more importantly the openPMD-api and ADIOS2-internal calls.

profile

@ax3l
Copy link
Member Author

ax3l commented Mar 25, 2024

Performance analysis by @guj:

Action items:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants