-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legendre nnls #287
base: main
Are you sure you want to change the base?
Legendre nnls #287
Conversation
vector (in tdata) so that NNLS can be used instead of BVLS. Fixed bug when ValueError excepted in rebinning
negative legendre terms. Refactored a little to clean up. For archetypes with legendre terms, NNLS is now actually used with additional terms of -1*each legendre term so that we can use the faster NNLS to get the same result as BVLS.
because on CPU, the additional computation time for the larger array sizes dominates any time savings of NNLS vs BVLS
Note: This also includes a bug fix related to #278 |
@craigwarner-ufastro thanks for this update and apologies for the late review. First of all, a bug:
Second, this update required a lot of delicate bookkeeping about the number of legendre coefficients, setting up the prior, etc. impacting multiple functions in multiple files. I'm concerned about that from a maintenance perspective. It feels like this could/should be an internal detail of the solver, not something that requires the "user" to setup that negative/positive legendre trick themselves and then unpack the results. i.e. could you move all of this logic into |
negative legendre terms on the GPU in lieu of BVLS into per_camera_coeff_with_least_square_batch. Moved prior_on_coeffs into zscan.py, called from within per_camera_coeff_with_least_square_batch. Pass prior_sigma, a scalar, insted of the array from fitz to archetypes. Bug fix on case where only some object types, not all, have an archetype.
Thanks for the updates, @craigwarner-ufastro . Please resolve the merge conflicts that have arisen here due to other merge-conflicts. If they aren't trivial to resolve in-place, please don't merge main into this branch, which would make the changes in this PR harder to evaluate mixed in with all the other changes. If needed, ok to make a new branch from this branch and then rebase that off of main and start a new PR. Ping me on Slack if that operation description is confusing. Back to the updates: I'd like to take this one step further to avoid legendre-specific manipulations to tdata prior to calling the solver, e.g. in tdata[hs] = np.append(tdata[hs], legendre[hs].transpose()[None,:,:], axis=2)
if per_camera and self._solver_method == 'bvls' and use_gpu:
#Using BVLS and GPU - append negative copies of legendre terms as well to use NNLS with additional bases to approximate BVLS
tdata[hs] = np.append(tdata[hs], -1*legendre[hs].transpose()[None,:,:], axis=2) and in tot_deg_legendre = deg_legendre
for key in archetypes:
if archetypes[key]._solver_method == 'bvls':
tot_deg_legendre = deg_legendre*2
...
maxcoeff = max(np.max([t.template.nbasis for t in templates]), ncamera*(tot_deg_legendre)+n_nearest) in the second case, I'm not even sure that is correct because we are reserving Suggestion: rather than manipulating i.e. from a user-perspective, turn this into "if you want to include legendre terms, pass in the legendre template arrays to use" and all other details of how those are combined with |
terms with 0 coefficients and make the coeffs negative for negative terms. Refactored to remove the last bit of the BVLS-NNLS trick from archetypes.py and zfind.py by passing binned instead of tdata to per_camera_coeff_with_least_square_batch.
involve using a list of bands instead of ncam to be more general.
Non-negative least squares (NNLS) is significantly faster than the general case bounded value least squares (BVLS). For archetypes, BVLS had been used where the archetype term had a bounded solution from 0 to inf and all of the legendre term solutions were allowed to float. By adding additional terms to our basis vector that are the inverse of each legendre term, we are able to use NNLS instead, e.g., we have negative copies, v3 = -v1 and v4 = -v2 so that in our solution we could have c1 > 0, c3 = 0 for a positive result and the opposite for a negative.
This also required a modification to the prior array which was added to the
M
vector. The results using NNLS now agree to within np.allclose to the original BVLS solution, with a significant speed gain. TheFinding best redshift
section takes 45s on 4 GPU + 4 CPU versus 99s using BVLS. However this only applies to when using GPUs.When using 64 CPUs, additional negative legendre terms + NNLS are 113s for this section versus 74s for BVLS. This is because the computation time for various array operations such as dot products on the larger arrays with the added terms overwhelms the speed gains from NNLS vs BVLS and thus when using CPU mode, we do NOT use the NNLS method and maintain the original BVLS.