Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legendre nnls #287

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Legendre nnls #287

wants to merge 7 commits into from

Conversation

craigwarner-ufastro
Copy link
Contributor

Non-negative least squares (NNLS) is significantly faster than the general case bounded value least squares (BVLS). For archetypes, BVLS had been used where the archetype term had a bounded solution from 0 to inf and all of the legendre term solutions were allowed to float. By adding additional terms to our basis vector that are the inverse of each legendre term, we are able to use NNLS instead, e.g., we have negative copies, v3 = -v1 and v4 = -v2 so that in our solution we could have c1 > 0, c3 = 0 for a positive result and the opposite for a negative.

This also required a modification to the prior array which was added to the M vector. The results using NNLS now agree to within np.allclose to the original BVLS solution, with a significant speed gain. The Finding best redshift section takes 45s on 4 GPU + 4 CPU versus 99s using BVLS. However this only applies to when using GPUs.

When using 64 CPUs, additional negative legendre terms + NNLS are 113s for this section versus 74s for BVLS. This is because the computation time for various array operations such as dot products on the larger arrays with the added terms overwhelms the speed gains from NNLS vs BVLS and thus when using CPU mode, we do NOT use the NNLS method and maintain the original BVLS.

vector (in tdata) so that NNLS can be used instead of BVLS.

Fixed bug when ValueError excepted in rebinning
negative legendre terms.  Refactored a little to clean up.

For archetypes with legendre terms, NNLS is now actually used with
additional terms of -1*each legendre term so that we can use the faster
NNLS to get the same result as BVLS.
because on CPU, the additional computation time for the larger array
sizes dominates any time savings of NNLS vs BVLS
@coveralls
Copy link

coveralls commented Mar 12, 2024

Coverage Status

coverage: 37.979% (-0.6%) from 38.533%
when pulling fd76617 on legendre_nnls
into 028848f on main.

@craigwarner-ufastro
Copy link
Contributor Author

Note: This also includes a bug fix related to #278

@sbailey
Copy link
Collaborator

sbailey commented Apr 12, 2024

@craigwarner-ufastro thanks for this update and apologies for the late review.

First of all, a bug:

srun -n 4 --gpu-bind=map_gpu:3,2,1,0 --cpu-bind=cores rrdesi_mpi --gpu \
  -i $CFS/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits \
  --archetypes /global/cfs/cdirs/desi/users/abhijeet/new-archetypes/rrarchetype-galaxy.fits \
  -o $SCRATCH/redrock/redrock-legnnls.fits
...
--- Process 0 raised an exception ---
Proc 0: Traceback (most recent call last):
Proc 0:   File "/global/common/software/desi/users/sjbailey/redrock/py/redrock/external/desi.py", line 916, in rrdesi
    scandata, zfit = zfind(targets, dtemplates, mpprocs,
Proc 0:   File "/global/common/software/desi/users/sjbailey/redrock/py/redrock/zfind.py", line 602, in zfind
    if archetype._solver_method == 'bvls':
Proc 0: AttributeError: 'NoneType' object has no attribute '_solver_method'

MPICH Notice [Rank 0] [job id 24313752.5] [Fri Apr 12 09:37:16 2024] [nid002260] - Abort(0) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0

Second, this update required a lot of delicate bookkeeping about the number of legendre coefficients, setting up the prior, etc. impacting multiple functions in multiple files. I'm concerned about that from a maintenance perspective. It feels like this could/should be an internal detail of the solver, not something that requires the "user" to setup that negative/positive legendre trick themselves and then unpack the results. i.e. could you move all of this logic into per_camera_coeff_with_least_square_batch itself? If it receives method='bvls' and use_gpu=True, it would automatically switch to the +/- legendre nnls method, including padding the arrays and expanding the prior as needed? Can that be implemented with similar speed gains, while isolating all of the messy bookkeeping to a single location?

negative legendre terms on the GPU in lieu of BVLS into
per_camera_coeff_with_least_square_batch.  Moved prior_on_coeffs into
zscan.py, called from within per_camera_coeff_with_least_square_batch.
Pass prior_sigma, a scalar, insted of the array from fitz to archetypes.

Bug fix on case where only some object types, not all, have an
archetype.
@sbailey
Copy link
Collaborator

sbailey commented Apr 29, 2024

Thanks for the updates, @craigwarner-ufastro . Please resolve the merge conflicts that have arisen here due to other merge-conflicts. If they aren't trivial to resolve in-place, please don't merge main into this branch, which would make the changes in this PR harder to evaluate mixed in with all the other changes. If needed, ok to make a new branch from this branch and then rebase that off of main and start a new PR. Ping me on Slack if that operation description is confusing.

Back to the updates: I'd like to take this one step further to avoid legendre-specific manipulations to tdata prior to calling the solver, e.g. in archetypes.nearest_neighbor_model:

tdata[hs] = np.append(tdata[hs], legendre[hs].transpose()[None,:,:], axis=2)
                if per_camera and self._solver_method == 'bvls' and use_gpu:
                    #Using BVLS and GPU - append negative copies of legendre terms as well to use NNLS with additional bases to approximate BVLS
                    tdata[hs] = np.append(tdata[hs], -1*legendre[hs].transpose()[None,:,:], axis=2)

and in zfind.zfind:

        tot_deg_legendre = deg_legendre
        for key in archetypes:
            if archetypes[key]._solver_method == 'bvls':
                tot_deg_legendre = deg_legendre*2
...
        maxcoeff = max(np.max([t.template.nbasis for t in templates]), ncamera*(tot_deg_legendre)+n_nearest)

in the second case, I'm not even sure that is correct because we are reserving deg_legendre*2 coefficients but those extra 2x coefficients are for the BVLS -> NNLS trick and shouldn't be needed for the output arrays.

Suggestion: rather than manipulating tdata (template data) to add legendre terms and then calling per_camera_coeff_with_least_square_batch(... tdata, nleg=...), could you pass tdata and the normal legendre array, and let per_camera_coeff_with_least_square_batch derive nleg from the shape of legendre and whether it needs to do the BLVS -> NNLS trick and tdata manipulation internally based upon method and use_gpu?

i.e. from a user-perspective, turn this into "if you want to include legendre terms, pass in the legendre template arrays to use" and all other details of how those are combined with tdata etc. become internal to per_camera_coeff_with_least_square_batch. Thoughts?

terms with 0 coefficients and make the coeffs negative for negative
terms.

Refactored to remove the last bit of the BVLS-NNLS trick from
archetypes.py and zfind.py by passing binned instead of tdata to
per_camera_coeff_with_least_square_batch.
involve using a list of bands instead of ncam to be more general.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants