-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up object trimming and skip ingest of drawn objects for restarts #174
Conversation
… debug statements and suppress ErfaWarnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cwwalter,
Could you have a look at this please? We'd like to make these changes available for the restarts since they should speed things up as well as reduce the memory usage.
There are two major changes which I will summarize here and comment on below.
- A change in
sources_from_list
in the code that finds the objects per sensor. The new version only performs that calculation for the sensor being simulated instead of looping over all 189, thereby save a lot of time. - Reading in the drawn objects from the checkpoint files so that most of the overhead associated with ingesting them from the instance catalogs can be skipped.
out_obj_dict = {} | ||
for det in lsst_camera(): | ||
chip_name = det.getName() | ||
if target_chip is not None and chip_name != target_chip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a specific sensor being considered (target_chip
), then this if statement will skip the calculations below for the other 188 sensors.
python/desc/imsim/trim.py
Outdated
ra0, dec0 = self.compute_chip_center(chip_name) | ||
seps = degrees_separation(ra0, dec0, self._ra, self._dec) | ||
index = np.where(seps < radius) | ||
if chip_name in self.trimmer.drawn_objects: | ||
not_drawn = [not x in self.trimmer.drawn_objects[chip_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this run starts from a checkpoint, this code omits objects that have already been drawn from the objects that are passed to the downstream object processing code. In addition to speeding up that processing, the per sensor memory usage can be substantially reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old comment I originally typed:
I'm having a hard time understanding the 'chip_name in self.trimmer.drawn_objects' since in the gs_intepreter the list is made out of uniqueIds here: self.drawn_objects.add(gsObject.uniqueId). How does the chip_name get in?
I think it is related to the fact that the sensor name is in the checkpoint file?
OK, I think I understand this now. The trimmer drawn_objects is different:
self.drawn_objects[detname] = ckpt['drawn_objects']
I think maybe the names of the two variables should be different for clarity. Perhaps one could be drawn_objects_on_sensor or something?
I'm going to merge this tomorrow at 9am PT unless I hear otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, finally finished. Sorry for delay; this looks good. A few comments and suggestions below.
my_flat.write('flat_{}_{}_{}.fits'.format(visit, ccd_id, obs_md.bandpass)) | ||
prefix = config['persistence']['eimage_prefix'] | ||
my_flat.write('{}_{}_{}_{}.fits'.format(prefix, visit, ccd_id, | ||
obs_md.bandpass)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this makes it so Flats are not differently labeled than normal e-image files. Is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. The script that converts the eimage files to raw
files expects the lsst_e
prefix, and since flats are ingested just like any other file, it made sense just to use the same naming convention. The 'IMGTYPE' keyword in the phdu should identify them as flats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. I see. I guess at least the visit number will be different. Maybe we could have some convention for flat visit # sequences in order to be able to quickly guess from the file name? AIso, I suppose they typically won't show up in the same directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed a convention in the ImageProcessingPipeline wiki in the Generating Calibration Products entry: 3xxxxxx
, 4xxxxxx
, 5xxxxxx
for biases, darks, and flats, respectively.
@@ -152,12 +152,16 @@ def metadata_from_file(file_name): | |||
return commands | |||
|
|||
|
|||
def sources_from_list(object_lines, obs_md, phot_params, file_name): | |||
def sources_from_list(object_lines, obs_md, phot_params, file_name, | |||
target_chip=None, log_level='INFO'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should have a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/desc/imsim/trim.py
Outdated
ra0, dec0 = self.compute_chip_center(chip_name) | ||
seps = degrees_separation(ra0, dec0, self._ra, self._dec) | ||
index = np.where(seps < radius) | ||
if chip_name in self.trimmer.drawn_objects: | ||
not_drawn = [not x in self.trimmer.drawn_objects[chip_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old comment I originally typed:
I'm having a hard time understanding the 'chip_name in self.trimmer.drawn_objects' since in the gs_intepreter the list is made out of uniqueIds here: self.drawn_objects.add(gsObject.uniqueId). How does the chip_name get in?
I think it is related to the fact that the sensor name is in the checkpoint file?
OK, I think I understand this now. The trimmer drawn_objects is different:
self.drawn_objects[detname] = ckpt['drawn_objects']
I think maybe the names of the two variables should be different for clarity. Perhaps one could be drawn_objects_on_sensor or something?
|
||
# Collect the selected objects. | ||
selected = [self.object_lines[i] for i in index[0]] | ||
if sort_magnorm: | ||
# Sort by magnorm. | ||
self.trimmer.logger.debug('sorting by magnorm') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to happen for all runs? Only ones when multiprocessing is used?
It looks like InstCatTrimmer is always being called from parsePhoSimInstanceFile.
For simple debugging for individuals (rather than big production) having the processing order different than what is in the file will be confusing and will make debugging difficult. If it is always going to happen maybe we could make the default to be non-sorted and then allow a flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a real use-case you've encountered? The sorting has been being done for quite a while now. If we want to make it configurable, it would be better to do this via the config file instead of a command line flag. And if it doesn't come up often, I think it is better to sort by default. In any case, the effect of this will probably change in the near future if we break up bright objects in to chunks and randomize the ordering of the object drawing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but maybe the last time I did it and really cared about the order was before the sorting started? If you have a catalog and you want to go through and track what it is doing line by line it would be confusing if the order wasn't what you input.
Also, when doing testing I often will do (say) the 1st 2000 objects for speed. What happens in that case? Do you get the 2000 brightest objects? This might actually explain some confusing things I have seen recently when I predicted times based on running a fraction of a file.
True about things changing if we break things up. Maybe we should set up three modes if we do that: "as read", "magnum sorted", and "interleaved"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the config parameter. We can revisit this when we look into breaking up the bright objects.
…naming and integration into lsst_distrib
I'll merge this at 6pm PT today. |
OK these all look good. Thanks! |
No description provided.