-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* One possible bug remains that I have been unable to track down but …
…have concluded is unrelated to the modification in this PR. Steps to reproduce: srun -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0 rrdesi_mpi --gpu --max-gpuprocs 4 -n_nearest 4 --archetypes new-archetypes/ -i $CFS/desi/spectro/redux/fuji/tiles/cumulative/100/20210505/coadd-0-100-thru20210505.fits -o $SCRATCH/abhijeet.fits srun -n 64 -c 2 rrdesi_mpi -n_nearest 4 --archetypes new-archetypes/ -i $CFS/desi/spectro/redux/fuji/tiles/cumulative/100/20210505/coadd-0-100-thru20210505.fits -o $SCRATCH/abhijeet2.fits These two results will be np.allclose() but not equal which is as expected. The CPU version will be equal to Abhijeet's original code, also as expected. Now rerun with -n_nearest 5 and the CPU version is still equal to Abhijeet's original code, but there are a small ~10 number of zzchi2 values and zzcoeff values that are different more than np.allclose (as much as 3.0 difference) between GPU and CPU. Going back to Abhijeet's original nearest_neighbor_model code, and running on the CPU for that method with GPU get_best_archetype and there is still the same small differences, despite the fact that the output is np.allclose when not doing -n_nearest (or even for doing -n_nearest 4). It does look like the differences seem to be most? all? in QSO templates. But the trans array shows no difference - as far as I've been able to tell, we are sending the same input tdata to calc_zchi2_one on the CPU and getting a slightly different chi2. But only for -n_nearest == 5. However since this is independent of the code changes in the current PR (I get the exact same differences in this version versus rolling back to the previous one), it makes sense to proceed with this PR. ---------------------- - Added legendre function to Target class so that legendre of a certain degree can be calculated (and optionally copied to GPU) once without additional overhead. - Added default value of 15 for fitz() - In fitz() use Target.legendre to calculate legendre. Pass target object instead of spectra to get_best_archetype, which allows for simplification as spectra, gpuweights, gpuflux, etc are all members of Target class. Store trans dict and pass that to get_best_archetype to eliminate need to re-calculate transmission for the same wavelength regime. - In archetypes, added properties gpuwave and gpuflux to copy and cache data on the GPU. These are then used in rebin_template_batch. - In get_best_archetype, pass target instead of spectra, which allows for elimination of dedges and legendre as args. Get spectra, gpuweights, gpuflux, gpuwflux, dedges, and legendre directly from the target object. Copying trans and using Target.legendre reduce runtime by about 1s on 4 GPUs. - Vectorized and GPUized nearest_neighbor_model. Instead of just passing trans to it, get_best_archetype now passes the binned dict, which already has rebinned flux multiplied by trans. Using the Target.legendre also saves time. Since the size of the tdata arrays is small (nbasis of a few), it is faster to keep operations on CPU for calc_zchi2_batch similar to in fitz(). Timing notes: adding nearest_neighbor_model is a bigger hit to the GPU than CPU but since get_best_archetypes is so much faster on the GPU, the combined time is still a good speed-up.
- Loading branch information
1 parent
0397d18
commit 6ece174
Showing
3 changed files
with
132 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters