Skip to content

use HDF5 files for large arrays in spa and irradiance #236

Closed
@mikofski

Description

@mikofski

The very large arrays in SPA and irradiance modules take a lot of space, which IMO makes those modules hard to navigate. See #235. Also having them in code can present other issues, for example if those coefficients should be changed or expanded. EG: If new sets of Perez coefficients are released.

Some proposals:

  1. Move data to the bottom of the module, and make it constant. When modules are imported, first only the top level symbols are interpreted, so the module attribute MYDATA will be interpreted before the class attribute mydata and won't raise a NameError for an unresolved reference.
import numpy as np

class ClsUsingData(objects):
    mydata = MYDATA

    def __init__(self, *args):
        # do stuff with data

# other stuff

# all constants with very large arrays at the bottom of module
MYDATA = np.arrary([
    # lots of data
])
  1. Use HDF5 files using h5py. These files are highly optimized for speed and act exactly like NumPy arrays. It's okay to keep them open, they'll be closed when Python exits. HDF5 will quickly load the data only when sliced (using mp threads if h5py built with mpicc or openmp) so memory usage is faster and more efficient. Alternately, copying all of the data out of the file into a numpy array will allow you to close the file, but it is slower and less efficient.
import h5py
import os
DIRNAME = os.path.dirname(__file__)
MYDATA = os.path.join(DIRNAME, 'mydata.h5')

class ClsUsingData(objects):
    mydata = h5py.File(MYDATA)  # leave it open

# alternately copy the data to a numpy array and close the file:
# h5_data_path = '/group/dataset'
# with h5py.File(MYDATA, 'r') as f:
#     mydata = np.array(f[h5_data_path])

    def __init__(self, *args):
        # do stuff with data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions