Closed
Description
The very large arrays in SPA and irradiance modules take a lot of space, which IMO makes those modules hard to navigate. See #235. Also having them in code can present other issues, for example if those coefficients should be changed or expanded. EG: If new sets of Perez coefficients are released.
Some proposals:
- Move data to the bottom of the module, and make it constant. When modules are imported, first only the top level symbols are interpreted, so the module attribute
MYDATA
will be interpreted before the class attributemydata
and won't raise aNameError
for an unresolved reference.
import numpy as np
class ClsUsingData(objects):
mydata = MYDATA
def __init__(self, *args):
# do stuff with data
# other stuff
# all constants with very large arrays at the bottom of module
MYDATA = np.arrary([
# lots of data
])
- Use HDF5 files using
h5py
. These files are highly optimized for speed and act exactly like NumPy arrays. It's okay to keep them open, they'll be closed when Python exits. HDF5 will quickly load the data only when sliced (using mp threads if h5py built with mpicc or openmp) so memory usage is faster and more efficient. Alternately, copying all of the data out of the file into a numpy array will allow you to close the file, but it is slower and less efficient.
import h5py
import os
DIRNAME = os.path.dirname(__file__)
MYDATA = os.path.join(DIRNAME, 'mydata.h5')
class ClsUsingData(objects):
mydata = h5py.File(MYDATA) # leave it open
# alternately copy the data to a numpy array and close the file:
# h5_data_path = '/group/dataset'
# with h5py.File(MYDATA, 'r') as f:
# mydata = np.array(f[h5_data_path])
def __init__(self, *args):
# do stuff with data
Metadata
Metadata
Assignees
Labels
No labels