Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better integration #608

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Conversation

davidnwobi
Copy link

@davidnwobi davidnwobi commented Aug 29, 2024

Updated models that require 1D integration to use new integration system. The new model have "_high_res" appended to their names.

The updated models are:

  • Core-Shell Bicelle
  • Core-Shell Cylinder
  • Core-Shell Ellipsoid
  • Cylinder
  • Ellipsoid
  • Flexible Cylinder with Elliptical Cross-section
  • Hollow Cylinder

Details of the method can be found at this repo

@pkienzle
Copy link
Contributor

pkienzle commented Sep 6, 2024

(1) Using Δq = 2 π / d_max as a period spacing, and selecting, say, k=5 sample points per period, then you should get a ballpark estimate for the number of integration points by looking at the arc length at radius |q|. That is, Δq = |q| k Δθ, so step size in theta would be Δθ = Δq/(k |q|), and number of steps would be n = (π/2) / Δθ = k |q| d_max / 4.

How does this value compare to the n estimated by Gauss-Kronrod rule? If it is good enough, then we can apply it more easily across all the shape models without a bespoke spline fit for each model type.

(2) Rather than generic Gauss-Legendre, we could look at the structure of the function and select more points where it is larger (importance sampling). The problematic shapes have high eccentricity, which will show up near θ=0° or θ=90°.

(3) There are analytic approximations for large disks and long rods (#109). Can these be used for large q? Can they be extended to other models?

(4) The current code cannot run in OpenCL since the arrays are too large to put in constant memory. Not sure how to allocate them in main memory and make them accessible to the kernels.

(5) We could unroll the loops, computing the different (q, θ) values in parallel, then summing afterward. This would speed up the kernels a lot on high end graphics cards.

(6) Ideally we would update all models to use dynamic integration rather than having normal and high-resolution versions. It would be nice to have a mechanism to set the precision.

(7) What is the target precision on your models? Do you really need 32000 angles for them? You may need to be more careful about summing the terms to avoid loss in precision when adding many small numbers to a large total.

@pkienzle
Copy link
Contributor

pkienzle commented Sep 6, 2024

Note that you can adjust the number of points in the loop for an existing model. You can see this with

python -m sasmodels.compare -ngauss=512 ellipsoid

This is roughly equivalent to the following [untested]:

from sasmodels.core import load_model_info
from sasmodels.generate import set_integration_size
model_info = load_model_info('ellipsoid')
set_integration_size(model_info, 4096)

SasView could do this now, giving the number of theta points as a control parameter and sending that to sasmodels before evaluating. Not as good as having the model set the number of theta points as done in this PR, but it immediately applies to all models.

@davidnwobi
Copy link
Author

davidnwobi commented Sep 6, 2024

Thanks you for your feedback, there are several aspects here that I should have considered earlier.

  1. Comparison of integration points: I'll take a closer look and compare your method to the Gauss-Kronrod rule. While I'm no longer working with STFC, I do have some free time, so I'll run the comparisons and see how they stack up.

  2. Importance sampling: I completely agree. For the cylinder model, peaks do occur near the ends of the interval, so targeting those areas would definitely improve efficiency. While it might involve more work, the payoff could be significant.

  3. Analytical approximations: They can be used for large values of q but It also depends on the values of the other parameters. For example, for the cylinder model there are good approximations when either $q R$ and $\frac{q L}{2}$ is small. For other models, we might be able to find regions where certain parameters are small and see if the remaining part of the function can be integrated analytically.

  4. OpenCL compatibility: I've had similar issues with OpenCL. I haven’t been able to get either the updated or default models running with OpenCL, whether in sasmodels or SasView. This might be a bug. Interestingly, CUDA works fine,AMD’s parallel processing works but runs out of memory for the updated models (integrated graphics though). The OpenCL definitely needs some further investigation.

  5. Loop Unrolling: I could see if it's at least being done with theta at the moment.

  6. Having precision control would be ideal, especially if the tolerance can vary based on the model or region. This looks doable, but it would require some more work.

  7. What is the target precision on your models?

Target tolerance is a relative tolerance of 1 $\times 10^{-3}$.
As for needing 32,000 angles—no, not for the entire parameter space. Gauss-Quadrature struggles with highly oscillatory functions. it has a very slow convergence as the function becomes more oscillatory. Quick example for a decaying oscillatory
function:
$e^{- \sqrt{k} x} \sin(k x)$

With $k = 4000$:

  • $n = 512$; Rel error = $3.547640 \times 10^{-1}$
  • $n = 600$; Rel error = $4.000811 \times 10^{-2}$
  • $n = 700$; Rel error = $4.470038 \times 10^{-4}$
  • $n = 800$; Rel error = $1.042314 \times 10^{-4}$
  • $n = 900$; Rel error = $5.589838 \times 10^{-7}$
  • $n = 1024$; Rel error = $8.715261 \times 10^{-12}$

With $k = 400000$:

  • $n = 16384$; Rel error = $2.155721 \times 10^{0}$
  • $n = 20768$; Rel error = $1.492304 \times 10^{-1}$
  • $n = 22768$; Rel error = $5.929450 \times 10^{-2}$
  • $n = 24768$; Rel error = $7.977614 \times 10^{-3}$
  • $n = 26768$; Rel error = $2.309563 \times 10^{-4}$
  • $n = 28768$; Rel error = $4.160178 \times 10^{-4}$
  • $n = 30768$; Rel error = $3.771680 \times 10^{-5}$
  • $n = 32768$; Rel error = $5.919487 \times 10^{-6}$

Due to memory/space limitations, the number of points is selected from a discrete set (powers of 2 from 1 to 15), so that only those points need to be stored and loaded into memory. As a result, if it selects 32,768 points, it only means 16,384 wouldn’t be enough. It doesn't check any of the possibilities in between. It could have gotten away with less but it can't figure that out. A bit crude but it helps. Do you have any suggestions on a better way to handle this?

I'll look some of these issues.

  • Comparing with the Gauss Kronrod rule
  • And a dynamic way of setting the tolerance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants