-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimization for large batch runs on PCs #126
Comments
Here are some notes from a recent batch of simulations. I was simulating ~350 homes with a few different parameters for a total of 1700 simulations. 2 minute timesteps and full annual simulations, saving 2-min data to parquet. Using os-hpxml1.7 branch: https://github.com/NREL/OCHRE/tree/e3f13d5164e923a65f6728dae107610c20f7df4d Initially I tried to initialize all dwelling objects then run them using a multiprocessing.Pool() object. However memory use became and immediate issue as results are retained after a simulation. Then I switched to an approach where I initialized and ran batches of 30-60 homes at a time, this kept memory usage to an acceptable level, but still higher than ideal. The workstation I am using has a Core i9-14900K and 192 GB of RAM. Anyone who is looking to fully utilize multi-core processors will need to equip with as much RAM as possible. Comparatively E+ uses much less RAM per simulation core. Memory: 140 GB when keeping 62 simulations in memory, and running 31 at a time. Processor: Efficiency cores seem to have a big performance hit, 5 sims in parallel take 2.3 min, while 30 runs takes 5.3 min (CPU has 8P cores and 16 E cores with total of 32 threads) Runtime: running all 1700 simulations took approx 5 hours. |
Here are the functions I was using to handle the multiprocessing. Could be useful to anyone else doing large batches of simulations. import multiprocessing as mp
import datetime as dt
import os
from ochre import Dwelling
import sys
def mute():
#Mute ouitput from child processes to clean up console
sys.stdout = open(os.devnull, 'w')
def start_batch(dwelling_pre_list):
#Generate and run batch of dwellings based on a list of dwelling arguments
#Should be pre-chunked to limit memory usage (30-60 dwellings)
dwellings = gen_dwellings(dwelling_pre_list)
dwellings = [d for d in dwellings if type(d)==tuple] #Filter out none type / error dwellings
batch_metrics = run_dwellings(dwellings)
return batch_metrics
def run_dwellings(dwellings,processors=None,return_df=True):
#Primary call to create mp pool and iterate over batch of dwellings
# dwellings is a list of tuples: (dwelling arguments, dwelling)
# arguments are passed so that the meta-data can be reatined when the job is done
if processors == None:
processors = max(1, mp.cpu_count() - 1) #Default to one less than max CPUs
pool = mp.Pool(processors, initializer=mute)
result = pool.map(multirunner, dwellings) #Zip with return_df argument
pool.close()
return result
def multirunner(dwelling):
#Handles multiple arguments to run call
result = run(dwelling[0],dwelling[1])
return result
def run(dwelling_args,dwelling, return_df=False):
#Run the dwelling object and return results.
# Try statement to handle failed sims during large batches.
key=dwelling_args['parameters']
key.update({'name':dwelling.name})
try:
df, metrics, hourly = dwelling.simulate()
#dwelling.name
if return_df:
return df
else:
return {'key':key,'metrics':metrics}
except:
return {'key':key,'metrics':None}
def gen_dwellings(dwelling_pre_list,processors=None):
#Primary call to create pool, iterate and generate dwelling objects based on a list of dwelling arguments
if processors == None:
processors = max(1, mp.cpu_count() - 1) #Default to one less than max CPUs
pool = mp.Pool(processors, initializer=mute)
result = pool.map(multigen, dwelling_pre_list)
pool.close()
return result
def multigen(dwelling_pre):
#Generate dwellings, try statement to handle failed initializations.
# NOTE: This function also modifies some parameters specific to the simulation objective.
try:
kwargs = dwelling_pre[0]
modifiers = dwelling_pre[1]
dwelling_args = gen_dwellingargs(**kwargs)
dwelling_args['parameters']={'hplo':modifiers['hplo'],'ero':modifiers['ero']}
dwelling_args['name'] = kwargs['modelname']+'_HPLO-{}_ERO-{}'.format(modifiers['hplo'],modifiers['ero'])
dwelling = Dwelling(**dwelling_args)
#Modify parameters
dwelling.equipment['ASHP Heater'].outdoor_temp_limit = (modifiers['hplo']-32)*5/9
dwelling.equipment['ASHP Heater'].er_setpoint_offset = (modifiers['ero'])*5/9
dwelling.equipment['ASHP Heater'].upper_deadband_override=False #Need to fix
#Next modify heat pump capacity
#dwelling.equipment['ASHP Heater'].er_capacity_rated = 10000
#ER Capacity: 'er_capacity_rated'
return (dwelling_args,dwelling)
except:
return None
def gen_dwellingargs(modelname, input_dir,run_dir,xml_file,schedule_file,output_dir,epoch,weather_file,days=30):
dwelling_args = {
'name': modelname, # simulation name
# Timing parameters
'start_time': dt.datetime(2018, 1, 1, 0, 0), # year, month, day, hour, minute
'time_res': dt.timedelta(minutes=2), # time resolution of the simulation
'duration': dt.timedelta(days=days), # duration of the simulation
'initialization_time': dt.timedelta(days=5), # used to create realistic starting temperature
'time_zone': None, # option to specify daylight savings, in development
# Input parameters - Sample building (uses HPXML file and time series schedule file)
'hpxml_file': os.path.join(run_dir,xml_file),
'schedule_input_file': os.path.join(run_dir,schedule_file),
# Input parameters - weather (note weather_path can be used when Weather Station is specified in HPXML file)
# 'weather_path': weather_path,
'weather_file': os.path.join(input_dir,'WEATHER',weather_file),
# Output parameters
'verbosity': 3, # verbosity of time series files (0-9)
#'metrics_verbosity': 9, # verbosity of metrics file (0-9), default=6
# 'save_results': False, # saves results to files. Defaults to True if verbosity > 0
'output_path': os.path.join(output_dir,epoch), # defaults to hpxml_file path
# 'save_args_to_json': True, # includes data from this dictionary in the json file
'output_to_parquet': True, # saves time series files as parquet files (False saves as csv files)
# 'save_schedule_columns': [], # list of time series inputs to save to schedule file
# 'export_res': dt.timedelta(days=61), # time resolution for saving files, to reduce memory requirements
# Equipment parameters
'Equipment': {
},
# 'modify_hpxml_dict': {}, # Directly modifies values from HPXML input file
# 'schedule': {}, # Directly modifies columns from OCHRE schedule file (dict or pandas.DataFrame)
}
return dwelling_args |
Awesome, thanks @apoerschke! I'm going to look to add a new section to our documentation on setting up batch runs for external users, AFAIK you're the first one to really try this out. We also haven't really tried doing anything to optimize for this situation, since we can just throw this on a supercomputer. But it's something I definitely would like to try to address as funding allows, for now we'll just provide the best guidance we can and keep this issue open until we get that opportunity. |
Agreed with Jeff, we know memory use is a big issue, but it hasn't been a priority. I believe we know the solution too - most of the memory is in the equipment schedules, which can be partitioned and saved to files for high resolution/long duration runs. We can bump that up our priority list. One thing that you can do now to improve this is to stop returning the output data. That's save a lot of memory and allow larger batches to run at once. Then once all the runs have finished you can call |
Not everyone has a supercomputer like we do that they can use to run big batches of simulations. Andrew Poerschke at IBACOS has been trying to do some rather large runs and had this comment:
"Do you guys have any memory management strategies when running on the super computer? My workstation has 192 GB of memory, but keep filling it when running across all 32 cores. I am chunking and discarding dwelling objects with each batch. Is it just a factor when running 2-min timesteps?"
@apoerschke: If you have any tips this would be a good spot to put them, and then we'll add them into the documentation going forward.
The text was updated successfully, but these errors were encountered: