Custom fit models #2565

smalex-z · 2023-07-24T19:22:29Z

Description

Adds a feature to automatically generate plugin models with Structure Factors from the Generic Scattering Calculator. Models are written in a singular combined C + Python, with 2 options for effective radius: equivalent sphere volume and Radius of Gyration. It also gives you the option to choose your own file name. There are 4 parameters- SLD, Solvent SLD, Swelling, and Protein Volume.

This PR combines 2 other PRs as well due to .UI merging issues: #2548 and #2538.

How Has This Been Tested?

Upload a file into nuclear data in the Generic Scattering Calculator- preferably a .pdb file. Choose one of the 1D Scattering options, check the Plugin Models checkbox, and change the desired filename if wanted. Hit calculate and let it run.
After it is done calculating, the model file should automatically be created and put into the plugin_models folder of .sasview. You should be able to load it in the Fit Panel by going to Category - Plugin Models, then choosing your model name and choosing a structure factor as wanted.

Review Checklist (please remove items if they don't apply):

Code has been reviewed
Functionality has been tested
Windows installer (GH artifact) has been tested (installed and worked)
MacOSX installer (GH artifact) has been tested (installed and worked)
User documentation is available and complete (if required)
Developers documentation is available and complete (if required)
The introduced changes comply with SasView license (BSD 3-Clause)

…FitModels

butlerpd · 2023-07-25T13:52:20Z

Jeff to check two other PR then close if ok. Otherwise need to look at "does it do what it says on the tin" and overall code

src/sas/qtgui/Calculators/GenericScatteringCalculator.py

krzywon · 2023-08-02T18:41:41Z

I think three suggestions brought up by @pkienzle (appending to a numpy array, moving calculations into sascalc, and snake_case vs. CamelCase) should be fixed before merging this request.

#2538 and #2548 look fine at this point and can probably be closed in favor of this PR.

src/sas/sascalc/calculator/gsc_model.py

src/sas/qtgui/Calculators/GenericScatteringCalculator.py

src/sas/sascalc/calculator/geni.py

wpotrzebowski · 2023-08-30T04:17:34Z

I've tried it on several PDB models to see how it works on models with a large number of atoms. While it generally works fine I think we will need to put some limits on very large structures (e.g. virus capsids). For the following example https://github.com/Andre-lab/hbv_trSAXS/blob/master/HBVCP_empty_assembly/bayesian_models/capsid_T4.pdb it took about 11min to load the model and I interrupted curve computation after 30min.

yunliu01 · 2023-08-30T13:47:33Z

I've tried it on several PDB models to see how it works on models with a large number of atoms. While it generally works fine I think we will need to put some limits on very large structures (e.g. virus capsids). For the following example https://github.com/Andre-lab/hbv_trSAXS/blob/master/HBVCP_empty_assembly/bayesian_models/capsid_T4.pdb it took about 11min to load the model and I interrupted curve computation after 30min.

I agree. We could set a limit, above which, the program can give a warning saying that the loading and calculation may be slow.
Because the calculation is used just once, sometimes, people may be patient enough to let it go through.
We could leave the option to the users to decide.

Also, I am wondering if there is a way to estimate the amount of time needed to load and compute the model.
This information can allow the users to decide if they would like to wait or stop the computation.

Without knowing the details of how the PDB reader code, I might be wrong.
However, I would naively imagine that the loading time should be proportional to the number of atoms, N, in a PDB file.
And based on the equations used for the current P(Q) calculation in the generic scattering calculator, the needed computation time is probably proportional to N^2. If we could estimate the computation time needed after reading the first 1000 atoms, I would imagine that it could give us enough information to estimate the time needed for the rest of calculation for loading the PDB file. And the calculation time for P(Q) might be estimated in a similar way.

wpotrzebowski · 2023-08-31T07:21:50Z

I agree with @yunliu01 that some benchamrking would be useful. I can probably do some exploration during the code camp. In order not too block it maybe we can put some warning if we have more than X atoms. I can probably come up with rough estimation of X,

butlerpd · 2023-08-31T14:28:41Z

Just to be clear, the slow calculation of PQ is completely unrelated to this pull request and should be ticketed separately. This should in no away affect merging this PR.

Actually I think there may already be some related tickets. Basically the PQ calculation, as far as I know, is still the same code originally written >12 years ago and is, as I understand it, technically perfectly correct but the absolute slowest approach. We have discussed over the years a variety of faster algorithms, most of which fail at high Q but usually plenty good enough. This also may be a case where vectorization and GPU could significantly speed up the calculation?

To date the interest in this feature by the existing SasView community has been fairly low so there has been no pressure to fix it -- This could be a great opportunity to start doing so? If this part of the code was faster and more robust I believe it would make SasView more useful to the protein biophysics community -- specially those working with pharmeceutical problems?

wpotrzebowski · 2024-03-12T11:58:07Z

MAC installer doesn't start. It seems to work fine from local dev env.

src/sas/sascalc/calculator/geni.py

…ing RG in future

Change the names of the variables to avoid the confusion.

…odels # Conflicts: # src/sas/sascalc/calculator/geni.py

…lation errors

yunliu01 · 2024-05-06T02:44:51Z

Beta_Q should be one at Q=0. Therefore, calculated beta_Q needs to be normalized for the value at Q=0. This requires that both F_Q, and P_Q to be normalized properly at Q=0, i.e., F_Q(Q=0)=1 and P_Q(Q=0)=1. However, P_Q is currently not a normalized form factor. Therefore, the current code obtained the normalized Beta_Q by dividing it with Beta_Q[Q_min], the value of Beta_Q at lowest Q point. Note that when using the calculated form factor to fit the data, the code also needs to use the normalized form factor. The normalized form factor is also obtained by dividing the data with the value at the lowest Q point.

This approach could be problematic if a user forgets to choose the first Q point to be very small Q values. The documentation needs to be updated to let user know this whenever they use this function.

In principle, it is easy to calculate P_Q(Q=0) and F_Q(Q=0). In future, it is better to update the code to normalize the data with the exact values at Q=0 instead of using a value at a finite Q. With a quick look at the code, it seems that F_Q seems to be normalized correctly already. (This has not be fully tested yet.) However, the calculation of P_Q needs to be updated in future.

butlerpd · 2024-05-06T22:46:09Z

So to be clear @yunliu01 -- you are saying that Beta_Q is normalized to Beta_qmin but that F(Q) is normalized to F(Q=0)? Also, is not P(Q)=F(Q)^2 not then also normalized by P(Q=0) by definition? Finally if it is that simple should we not just normalize to Q=0 properly? Or is this considered too hard at this point?

butlerpd

Testing the install on windows (after merging the fast Debye calculator) works very nicely (and fast!). It looks to me that it is ready to merge.

Two points that should be addressed as soon as possible (and the first maybe as part of this release even?):

since plugin docs will now render in the installer version, it would be nice of some of the documentation currently buried in the help docs were moved (or even copied) to be part of the model docs as suggested in the new issue Add documentation to Custom Fit model generated by GCC #2872. In particular an explanation of the new parameters of swelling and volume and how they should be used.
Because this model is not based on an analytical equation but on a calculated series of points, it is quite possible for the data to end up being outside the value that can be returned by the model. If not, but the data has resolution, it is still possible that the fit needs data beyond what the model can return. It would be good to display a helpful message when the data+resolution is going to exceed the bounds of the calculated model telling the user that is the case and that they should recompute the plugin using a larger Q range. That may actually be a separate sasmodel issue

Finally I note that it is also possible that a user, particularly on a Mac, does not provide enough points in the Debye calculated plugin such that interpolation is bad in some areas (where there may be significant oscillations. Is there a way to flag that to the user? Is there a way to algorithmically assess it, maybe based on the smoothness of the curve?

wpotrzebowski · 2024-05-07T07:09:05Z

It seems to work on Mac as well (with default GSC calculation in the backend).

smalex-z added 12 commits July 6, 2023 15:23

Updated Rg-Mass and RG

1c99ab4

removed Solvent SLD from RG Calc.

fd98ffb

Adjusted Rg Layout and added tooltips

6bc4297

Merge branch 'main' into customFitModels

8e0151e

combined bQ & log Space calc.

5ee2373

Plugin Models (w/o S(Q))

4d9d2d9

Merge branch 'main' of https://github.com/SasView/sasview into custom…

8a148b3

…FitModels

python model mostly working

7ebf19b

Combined Python and C++ code for the plugin models

cc3faf2

added radius of gyration as effective radius option

2be57d1

Small Cleanups to code

4feabaa

Cleanup Pt 2

01cd04f

butlerpd added the Discuss At The Call Issues to be discussed at the fortnightly call label Jul 25, 2023

butlerpd requested review from butlerpd and lucas-wilkins July 25, 2023 13:51

This was referenced Jul 25, 2023

Updated rG&rg #2548

Closed

Gsc log spacing #2538

Closed

pkienzle reviewed Jul 25, 2023

View reviewed changes

butlerpd removed the Discuss At The Call Issues to be discussed at the fortnightly call label Aug 8, 2023

calculations + model generation moved to sascalc

600c047

krzywon reviewed Aug 10, 2023

View reviewed changes

src/sas/sascalc/calculator/gsc_model.py Outdated Show resolved Hide resolved

src/sas/qtgui/Calculators/GenericScatteringCalculator.py Outdated Show resolved Hide resolved

src/sas/qtgui/Calculators/GenericScatteringCalculator.py Outdated Show resolved Hide resolved

got rid of some conditionals & moved code into qtgui

a111780

pkienzle reviewed Aug 15, 2023

View reviewed changes

src/sas/sascalc/calculator/geni.py Outdated Show resolved Hide resolved

Merge branch 'main' into customFitModels

05f0ba1

Merge branch 'release_6.0.0' into customFitModels

df89f43

lucas-wilkins added the SasView 6.0.0 Required for 6.0.0 release label Mar 15, 2024

lucas-wilkins requested changes Mar 15, 2024

View reviewed changes

Wojciech Potrzebowski and others added 18 commits April 5, 2024 21:41

Merge branch 'release_6.0.0' into customFitModels

5803c8f

Typing and lint for empirical GSC model generation

e2becd3

Add TODO statement to GSC qValidator to ensure future need is known

5eb7795

Add the comment to define the default Q range for calculating P(Q) us…

920cc6e

…ing RG in future

change sin(qr)/qr to np.sinc(qr/pi)

dd570e8

Some variables for scatteirng length had names associated with SLD.

bbe4dc4

Change the names of the variables to avoid the confusion.

Merge remote and local changes

2bf649e

Merge remote-tracking branch 'origin/customFitModels' into customFitM…

9fd7614

…odels # Conflicts: # src/sas/sascalc/calculator/geni.py

Fix indentation issue

35c2495

Merge changes

9bfb214

geni.py cleanup phase 1

2e8ea25

geni.py cleanup phase s

a0ff643

Document and type the calculations in geni.py

ea0809f

Fix custom model pathing issue

6db8a37

Merge branch 'release_6.0.0' into customFitModels

b75991c

Remove file extension from GSC analytical model name to prevent compi…

8f7a8cf

…lation errors

Merge branch 'release_6.0.0' into customFitModels

610c121

Merge branch 'release_6.0.0' into customFitModels

a5c9ab0

Merge branch 'release_6.0.0' into customFitModels

3a7d849

butlerpd approved these changes May 6, 2024

View reviewed changes

lucas-wilkins approved these changes May 9, 2024

View reviewed changes

butlerpd merged commit d263a86 into release_6.0.0 May 11, 2024
28 checks passed

butlerpd deleted the customFitModels branch May 11, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom fit models #2565

Custom fit models #2565

smalex-z commented Jul 24, 2023 •

edited by wpotrzebowski

Loading

butlerpd commented Jul 25, 2023

krzywon commented Aug 2, 2023

wpotrzebowski commented Aug 30, 2023

yunliu01 commented Aug 30, 2023 •

edited

Loading

wpotrzebowski commented Aug 31, 2023

butlerpd commented Aug 31, 2023

wpotrzebowski commented Mar 12, 2024

yunliu01 commented May 6, 2024

butlerpd commented May 6, 2024

butlerpd left a comment

wpotrzebowski commented May 7, 2024

Custom fit models #2565

Custom fit models #2565

Conversation

smalex-z commented Jul 24, 2023 • edited by wpotrzebowski Loading

Description

How Has This Been Tested?

Review Checklist (please remove items if they don't apply):

butlerpd commented Jul 25, 2023

krzywon commented Aug 2, 2023

wpotrzebowski commented Aug 30, 2023

yunliu01 commented Aug 30, 2023 • edited Loading

wpotrzebowski commented Aug 31, 2023

butlerpd commented Aug 31, 2023

wpotrzebowski commented Mar 12, 2024

yunliu01 commented May 6, 2024

butlerpd commented May 6, 2024

butlerpd left a comment

Choose a reason for hiding this comment

wpotrzebowski commented May 7, 2024

smalex-z commented Jul 24, 2023 •

edited by wpotrzebowski

Loading

yunliu01 commented Aug 30, 2023 •

edited

Loading