You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The very large arrays in SPA and irradiance modules take a lot of space, which IMO makes those modules hard to navigate. See #235. Also having them in code can present other issues, for example if those coefficients should be changed or expanded. EG: If new sets of Perez coefficients are released.
Some proposals:
Move data to the bottom of the module, and make it constant. When modules are imported, first only the top level symbols are interpreted, so the module attribute MYDATA will be interpreted before the class attribute mydata and won't raise a NameError for an unresolved reference.
importnumpyasnpclassClsUsingData(objects):
mydata=MYDATAdef__init__(self, *args):
# do stuff with data# other stuff# all constants with very large arrays at the bottom of moduleMYDATA=np.arrary([
# lots of data
])
Use HDF5 files using h5py. These files are highly optimized for speed and act exactly like NumPy arrays. It's okay to keep them open, they'll be closed when Python exits. HDF5 will quickly load the data only when sliced (using mp threads if h5py built with mpicc or openmp) so memory usage is faster and more efficient. Alternately, copying all of the data out of the file into a numpy array will allow you to close the file, but it is slower and less efficient.
importh5pyimportosDIRNAME=os.path.dirname(__file__)
MYDATA=os.path.join(DIRNAME, 'mydata.h5')
classClsUsingData(objects):
mydata=h5py.File(MYDATA) # leave it open# alternately copy the data to a numpy array and close the file:# h5_data_path = '/group/dataset'# with h5py.File(MYDATA, 'r') as f:# mydata = np.array(f[h5_data_path])def__init__(self, *args):
# do stuff with data
The text was updated successfully, but these errors were encountered:
My vote is for moving the spa data to the end of the file. The h5 file sounds like overkill and may cause problems for the numba solarposition code or anything else that people do to multithread/process things. The coefficients are closely related to the code, so I don't see a problem having them in the module so long as they're not too long. You and I may differ on our definition of too long, though.
I used to store data in h5 files and every time something goes wrong while writing changes - you can't read this file any more. More data you store within this file - more space you need for backup before opening h5 file.
The very large arrays in SPA and irradiance modules take a lot of space, which IMO makes those modules hard to navigate. See #235. Also having them in code can present other issues, for example if those coefficients should be changed or expanded. EG: If new sets of Perez coefficients are released.
Some proposals:
MYDATA
will be interpreted before the class attributemydata
and won't raise aNameError
for an unresolved reference.h5py
. These files are highly optimized for speed and act exactly like NumPy arrays. It's okay to keep them open, they'll be closed when Python exits. HDF5 will quickly load the data only when sliced (using mp threads if h5py built with mpicc or openmp) so memory usage is faster and more efficient. Alternately, copying all of the data out of the file into a numpy array will allow you to close the file, but it is slower and less efficient.The text was updated successfully, but these errors were encountered: