-
Notifications
You must be signed in to change notification settings - Fork 15
imSim Redesign Wiki Page
Note you can also edit this wiki externally by cloning the repo: https://github.com/LSSTDESC/imSim.wiki.git
We used slides 9/10/11 of the Computing Planning slides on imSim as a framework during the discussion.
- Since the last meeting in February/March we successfully began integration of Batoid with GalSim/imSim (through NERSC grant), did testing and counting of photon collection to understand memory/resource and design needs, and have an initial PR to implement raining of photons in giant pool via OpenMP to the sensor model.
In discussion of what we learned and conclusions from these studies we found:
- We need to arrange things so we can return a photon list without drawing images.
- Since the photon count is dominated by bright stars which we are now FFTing, we should continue to do that and add them with the sky background to the canvas that we are going to add the photons to. This will be somewhat unphysical in the wings but near the core we can't use those stars anyway.
We then discussed next steps and what we need to implement. New features include:
-
New sky and instance catalog. See links for more info on sky catalog, but this will require planning and discussion with the CSim group. We also need to be able to produce and read a new high performance instance catalog for smaller scale work including text converters (this was re-asked for by DMatter people in the meeting).
-
See below for overall components needed but we discussed the desired framework we would like to see along with next steps which would refactor code and incorporate those pieces now in sims_GalSimInterface etc. We discussed whether it made sense to move from having an external code base calling GalSim as a library to moving to a model where we leverage the GalSim YAML config system and have imSim be a set of externally versioned modules, where the at various entry points in GalSim would call them. This would have the advantage that some of these modules (for example to read SkyCatalog) could also be used by things like WFIRST simulations.
After the meeting Chris talked more with Mike about the possibility to add more entry points in GalSim for this purpose to allow us to call various pieces of the simulation such as checkpointing and electronics simulation and Mike will look into this. The other way to do this may be to have a python script that that still drives the GalSim config system. This may be necessary to allow command line configuration for example.
Homeworks towards the next period are:
-
Josh and Mike will work on some refactoring of the galSim code to make chromatic shooting work properly and smoothly integrate the atmospheric PSF photon list passing to Batoid and then onto the sensor model.
-
Mike will work on improvements to only returning a photon list while drawing an object.
-
Jim and Chris will work on the SkyCatalog/instance Catalog design with others in the computer group.
-
Jim will try implementing a simplified sky or instance catalog piece inside of the GalSim config framework to see how that would work as a framework. He will look at previous DES examples and try to understand how we might work in that framework. Either though a driving script or completely though the YAML interface. Notes on the YAML interface can be found here: Config Docs
Continued work on imSim realism improvements will happened independent of this work (as it can always be moved later) and will be presented in the SSim meetings where we can also interface with the AGs.
Note: issues organized in categories/status can be found in imSim issue Tracking Board.
This wiki is for hosting design documents, planning documentation and links to relevant issues for the imSim redesign.
Our redesign is fallowing a basic pattern developed at the meeting in Berkeley.
Here were the initial slides from Berkeley 2019 meeting are here UCB 2019 (I think there were also some notes.. but it was probably a google document so I can't find it :( )
Decisions made were to (see below for more details):
-
Have a new high performance instance catalog format. Since that time we have also decided to let imSim additionally run from a new product called a Sky Catalog (SC) that would be produced by the CS team from CosmoDCx + stars etc.
-
Collect the photons from each object and put them into a pool to rain into the silicon model
-
Speed putting the photons into the silicon with an OpenMP approach.
-
Incorporate the code driving code now in the simsGalSimInterface into the main imSim codebase.
-
When possible try to reduce or optionally remove dependence on external LSST dependencies.
Relevant Issues:
Instance Catalogs (IC)s should be in binary format with bi-directional text file converters. When reading an instance catalog all components of an object should be presented to imSim at the same time and the amount of used memory can be limited to a fixed value.
A new data product called a Sky Catalog (SC) will also be usable. These data file(s) will contain descriptions all of the objects we would want to simulate on the sky. It will be split up into sky chunks which only covers what we want to simulate. It will have all of the information for the objects that we want to simulate, critically including the proper motion and transient variability parameters. The SCs can be distributed Using a pared down OpSim database file we can drive running imSim from that file.
imSim will have the option of either producing instance catalogs from the sky catalog (which can then be run) or it will be able to be run directly on the sky catalog itself. Produced instance catalogs can be used for checks of input used and also for running in lighter weight environments with resource limitations. Users will also be able to continue to create and run with instance catalogs for non DC scale simulation work where they have made the instance catalogs by hand or by some other program.
Relevant Issues:
Relevant Issues:
Currently imSim draws each sensor in parallel with the objects being added as postage stamps to the larger image. This means we have to hold all the objects in memory to be used, and has the drawback that the photons from different objects are not properly physically mixed.
In the new design each object will be passed via multi-processing to the GalSim photon shooting algorithm and all of the photons will be collected into a single pool for raining down on the sensors using openMP.
30 June 2021
Complications with generating the entire list of object photons for an exposure at once and raining them down onto a sensor in a single pass:
- For some total exposure length the total amount of photon data could exceed available memory.
- As electrons accumulate in sensor pixels, the effective geometric boundaries of the pixels on the focal plane change, so pixel boundary updates are required during the exposure to accurately model charge accumulation based on photon positions.
Propose a co-solution by approaching the total exposure time through a series of shorter sub-exposures. Given the exposure time for a sub-exposure, photons would be generated for all objects for that sub-exposure time and then rained down onto the sensor. Pixel boundaries can be updated and object photon lists can be refreshed between sub-exposures. The process is repeated until the cumulative exposure time from the sub-exposures reaches the total exposure time.
Benefits of this approach:
- Pixel boundary update accuracy is insensitive to photon order within each sub-exposure, since pixel boundaries will not be updated during a sub-exposure, so the developers are free to try to optimize photon order for performance.
- Pixel boundary updates are separated from charge accumulation, so no local locking mechanisms are required on pixel regions during pixel boundary updates.
Estimate the number of sub-exposures required to accurately update pixel boundaries:
- Choose the maximum change (delta) in the number of electrons in a single pixel that should trigger a pixel boundary update.
- Given the full well depth (D) of a sensor pixel, the number of sub-exposures required to update pixel boundaries sufficiently often for an object pixel that will reach exactly the full well depth at the end of the exposure is D/delta. Lower surface brightness pixels will be updated more often than necessary, but sufficiently often. Higher surface density pixels will saturate, so the details of the pixel boundaries are not as critical to model in detail, though we should ensure that the results are sufficiently bad.
- For LSST we expect D = ~100k, so for delta = ~1k we would expect ~100 sub-exposures, so this could be feasible.
Estimate the number of sub-exposures required to satisfy available memory constraints:
- Choose the amount of memory (M) to be allocated to hold photon data.
- Given the memory (m) required for each photon, the number of photons in memory at once is p = M/m.
- Given a list of object fluxes and a total exposure time we can estimate the total number of object photons (P) in the total exposure.
- The number of sub-exposures required to keep within memory constraints is then P/p.
- Photon number statistics and details of bright object implementation would be required to estimate this number of sub-exposures.
Considerations:
- A Python level trigger of the pixel boundary update would allow high level coordination of sub-exposure operations, as we may not need to do a pixel boundary update after every sub-exposure if the photon memory constraint is more severe. Though if the pixel boundary update operation is relatively brief it can also be done more often for increased accuracy.
- Knowing the balance of performance between generating photons from the object list and raining them down onto the sensor will inform the feasibility of this approach. On GPU-accelerated systems this will depend on which operations are on CPU or GPU, and on how much parallelism can be exposed for each operation on CPU or GPU. It may also be possible to overlap CPU and GPU operations, eg. generate photon lists on CPU threads and rain photons onto the sensor on the GPU, which could be overlapped by double-buffering the photon list.
- Faint objects: We could do all photons from faint objects first, which would reduce the number of objects in the main loop, reducing the total overhead from looping over the object list multiple times.
- Bright objects: If these are handled differently we may need to factor that into planning.
- Some sort of atomic/lock/replication-reduction mechanism will be required to handle race conditions if we are accumulating charge onto the sensor from multiple CPU/threads or GPU/work-items. Mechanism choice likely depends on whether CPU (locks or replication-reduction) or GPU (atomics?). There are opportunities to optimize via photon ordering and parallelism strategies.
- Accelerators: Want to take advantage of GPUs, but not be limited by choices related to GPUs. If there are reference implementations of each operation for CPUs then we can always run on a CPU-only system. Regular testing could help keep CPU and GPU implementations of same operation in step. Overlap of operations between CPUs and GPUs should probably be optional. Using programming models with implementations on multiple GPU architectures (eg. OpenMP target/offload) would increase possibilities for portability, though we may want to track what kind of performance losses that might incur. Also not sure if OpenMP target/offload can be used to effectively target CPU threads.
Earlier Adrian had considered using rough estimates of object peak surface brightness (~per-pixel?) to determine sub-exposure cadence for pixel boundary updates, but more general idea of full well depth and delta trigger seems likely better, but here's a link to GalSim max surface-brightness just in case: http://galsim-developers.github.io/GalSim/_build/html/gsobject.html#galsim.GSObject.max_sb
14 October 2021 (updated 21 October 2021)
Follow-up from discussion with James Perry:
- Add external hook accessible from python layer to trigger pixel boundary update.
- Run for entire detector assuming no charge deposition happening, eg. do not need localized triggers and locking mechanisms. Note: localized triggers and locks were discussed but not implemented, current code has internal trigger to update entire sensor at once.
- Could maintain backward compatibility for current use patterns in code, eg. enable options to disable current triggers and local/locking mechanisms, but also allow old flow control (maybe by default). Also want to check bright object pixel boundaries against backward compatible version - make sure that pixel badness has diffused out far enough.
- Update: Mike said the current mechanism is not heavily used by many codes, so we can probably remove the internal boundary update trigger and require codes to call the hook from the python control layer.
- Could we run the flux/detector arrays as integers for imSim? Some of the code looks templated. GPUs often have very good atomics for 4 and 8 byte integers, some have good single precision atomics, fewer have good double precision atomics. Note: CPU atomics are generally not great, but how much that matters depends on collision rate, etc.
- Update: Mike said some cases involving interpolated images require floating point (and negative) photon values, but we should be able to test instantiating and using integer detector arrays.
- Consider C++ test driver around charge accumulation code to make performance assessment easier. Currently only tested on CPUs using OpenMP atomics. May want to put effort into profiling/optimizing GPU implementation, especially if using OpenMP/target/offload where it can be a bit difficult to guess what code will be generated (eg. compared to lower level CUDA code).
- Update: Chris pointed out that this effort is independent of other code tasks and can start whenever.
- Check Adrian’s cartoon model of imSim:
- Iterate through object list and generate photon list - Mike?
- Raytrace photons through optics to focal plane position - Josh/batoid
- Interact photon with detector to deposit charge - James/galsim
- Redraw pixel boundaries based on charge deposition - James/galsim
- Updates:
- Charge deposition cadence is likely limited by pixel boundary update constraints
- Batoid cadence could be limited by available memory for photons, eg. fill photon pool from batoid and then cycle through charge deposition
- Want to have a fast method for sky background photons, either separately or fractionally mixed in with object photons (expected to be ~100 - ~10,000 photons/pixel depending on band and observing conditions)
- Adrian also wants to keep calibration products in mind, eg. fast and consistent method for flats