Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit with theory covmat with n3fit #1528

Merged
merged 67 commits into from
Jul 6, 2022
Merged

Fit with theory covmat with n3fit #1528

merged 67 commits into from
Jul 6, 2022

Conversation

andreab1997
Copy link
Contributor

@andreab1997 andreab1997 commented Feb 22, 2022

This will make possible to run a NNPDF4.0 fit including theory_covariance matrix

Edit: it also modifies the replica generation (make_replica) to utilize the covmat (with theory errors and multiplicative uncertainties if the proper flags are utilized)

@andreab1997 andreab1997 marked this pull request as draft February 22, 2022 16:53
@andreab1997 andreab1997 self-assigned this Mar 9, 2022
@andreab1997 andreab1997 marked this pull request as ready for review March 9, 2022 14:22
Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add one example runcard (and maybe a regression test for the invcovmat?)

Also -for @Zaharid - would you be against using .npy instead of .csv? I'm don't have a strong opinion on this but it would make things a bit faster and leaner (for a 4000x4000 matrix this is 382M vs 123M)

There is some conflict in the docs (I guess you modified the same document in this PR and also in the other one). Please fix that so that the merge is not blocked.

n3fit/src/n3fit/scripts/n3fit_exec.py Outdated Show resolved Hide resolved
validphys2/src/validphys/config.py Outdated Show resolved Hide resolved
validphys2/src/validphys/covmats.py Outdated Show resolved Hide resolved
validphys2/src/validphys/covmats.py Outdated Show resolved Hide resolved
@Zaharid
Copy link
Contributor

Zaharid commented Mar 10, 2022

Also -for @Zaharid - would you be against using .npy instead of .csv? I'm don't have a strong opinion on this but it would make things a bit faster and leaner (for a 4000x4000 matrix this is 382M vs 123M)

Yes please, do that.

@Zaharid
Copy link
Contributor

Zaharid commented Mar 10, 2022

Or, compute the covmat dynamically, not sure if that is better.

@Zaharid
Copy link
Contributor

Zaharid commented Mar 10, 2022

We also had #1091 which turned out to be a bit more involved and was forgotten about. Also probably @alecandido convinced me that npy is reasonable enough for this kind of purposes. I guess parquet still has the advantage of giving you an richer index.

@scarlehoff
Copy link
Member

As a note, I think having a csv in the regression tests is good. Having the changes in human readable form is nice (you can always parse them of course). But anyway, this would only be for the storing of the thcovmat.

Although maybe it makes sense to compute it on the fly indeed. @andreab1997 how long does setupfit take?

@andreab1997
Copy link
Contributor Author

As a note, I think having a csv in the regression tests is good. Having the changes in human readable form is nice (you can always parse them of course). But anyway, this would only be for the storing of the thcovmat.

Although maybe it makes sense to compute it on the fly indeed. @andreab1997 how long does setupfit take?

For the runcards I tried, it takes about 40 minutes against the 20 of a fit without theory covmat

@scarlehoff
Copy link
Member

scarlehoff commented Mar 10, 2022

For the runcards I tried, it takes about 40 minutes against the 20 of a fit without theory covmat

Then I guess it's better to store it and load it.

@andreab1997
Copy link
Contributor Author

I believe that the only left change to apply is the one related to .csv vs .npy, right? Anyway, I have not understood the final decision, should we dump the thcovmat in csv or npy? (leaving as given that we do not want to compute it online)

PS: @scarlehoff there are still your requested changes blocking the merging but I took already care of them. How can I solve this problem?

@scarlehoff
Copy link
Member

Anyway, I have not understood the final decision, should we dump the thcovmat in csv or npy?

Yes, please, using .npy will be faster and will use less memory!

How can I solve this problem?

You can't! I have to look through the changes!

@alecandido
Copy link
Member

alecandido commented Mar 11, 2022

PS: @scarlehoff there are still your requested changes blocking the merging but I took already care of them. How can I solve this problem?

I guess the proper solution would be to re-request a review: if you solved them, you should allow your reviewer to check that the solution is satisfactory (and not breaking anything else...).

@andreab1997
Copy link
Contributor Author

PS: @scarlehoff there are still your requested changes blocking the merging but I took already care of them. How can I solve this problem?

I guess the proper solution would be to re-request a review: if you solved them, you should allow your reviewer to check that the solution is satisfactory (and not breaking anything else...).

ok done, thank you :)

@andreab1997
Copy link
Contributor Author

Anyway, I have not understood the final decision, should we dump the thcovmat in csv or npy?

Yes, please, using .npy will be faster and will use less memory!

How can I solve this problem?

You can't! I have to look through the changes!

I am sorry, I am having problems finding the place where the writing happens. Can you please suggest me the right place where to look?(I believed this should have been in produce_nnfit_theory_covmat but I cannot find the writing) @scarlehoff @Zaharid

@scarlehoff
Copy link
Member

I'm going to guess the action that vp-setupfit loads has a @table decorator on top? Creating a new action that does the same but without the decorator (and with a numpy.save at the end) should do the trick.

@Zaharid
Copy link
Contributor

Zaharid commented Apr 7, 2022

@scarlehoff yes please.

@andreab1997
Copy link
Contributor Author

@scarlehoff yes please.

Can this be merged? @Zaharid

@andreab1997
Copy link
Contributor Author

@Zaharid I have solved the conflict. Can this be merged now?

@scarrazza
Copy link
Member

@RoyStegeman could you please have a quick review of this PR so we can merge?

@RoyStegeman
Copy link
Member

Sure

n3fit/src/n3fit/scripts/vp_setupfit.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/scripts/n3fit_exec.py Show resolved Hide resolved
norm_threshold=None,
dataset_inputs_t0_predictions,
loaded_theory_covmat,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Juan says

use_weights_in_covmat=True,
norm_threshold=None,
dataset_inputs_t0_predictions,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what Juan says (here and functions below)

validphys2/src/validphys/covmats.py Outdated Show resolved Hide resolved
validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved
validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved
@NNPDF NNPDF deleted a comment from andreab1997 Jul 1, 2022
@scarrazza
Copy link
Member

@andreab1997 could we merge this?

@andreab1997
Copy link
Contributor Author

@andreab1997 could we merge this?

For me yes. I don't know if @RoyStegeman agrees.

@RoyStegeman
Copy link
Member

Sure please go ahead

@andreab1997
Copy link
Contributor Author

@andreab1997 could we merge this?

Actually give me till this evening, I have to fix a minor thing and then I will merge.

Comment on lines +812 to +815
if f == path:
raise ValueError(
"More than one theory_covmat file in folder tables"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreab1997 this doesn't seem right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Member

@RoyStegeman RoyStegeman Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is not as general as I originally thought. It probably does do what you intended, but I tried it with a custom covmat and got this error because I did not set use_scalevar_uncertainties to false, in which case this message does not make a lot of sense since there was only one covmat.

So I guess we just need to update the example here: https://github.com/NNPDF/nnpdf/blob/278310410398b36b95e45246992c8ed07db6d6a6/doc/sphinx/source/tutorials/general_th_covmat.rst? If that's correct, I can update the docs.

P.S. the check can be written in one line for example as

if set(paths)&set(files):
    raise ValueError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes I agree. I would say the check is checking what I wanted (even if it can be written better) but the error message is misleading.

Copy link
Member

@RoyStegeman RoyStegeman Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no situation in which both use_scalevar_uncertainties and use_user_uncertainties are true? If so, that should probably be checked separately by vp. Are there valid situations in which both are false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment I would say no but in principle yes (for example one can have a different source of theory uncertainties which are not user_uncertainties)

Copy link
Member

@RoyStegeman RoyStegeman Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but here use_scalevar_uncertainties is just synonym for using a covmat that has been constructed from different theoryID's (regardless of the source of the theory uncertainty) with the instructions in the runcard, while user_uncertainties is an external covmat (not necessarily non-scalevar).

So for now I think use_scalevar_uncertainties can probably be deprecated as it is just the inverse of user_uncertainties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree, I can take care of the changes. Just you know this better than me, so it's good if you can confirm :)

Copy link
Contributor Author

@andreab1997 andreab1997 Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. In my mind we could have a third option, like use_whatever_uncertainties, that uses a different prescription from the scalevar. However it is true that we do not have it now so, for the time being,use_scalevar_uncertainties is just not user_uncertainties. So yes, I agree :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave hypothetical future features for hypothetical future people to work on ;)

Ok, I'll open a PR some point this week and ask for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants