Tevatron inclusive DY #2185

jacoterh · 2024-10-21T09:50:52Z

This PR implements filter functions for the Tevatron inclusive DY datasets:

CDF_Z0_1P96TEV_ZRAP: agreement with legacy implementation
D0_WPWM_1P96TEV: agreement with legacy, except for possible rounding errors in the old implementation of the kinematics, see specifically the last two bins here
D0_Z0_1P96TEV: agreement with legacy. One difference: systematic in 3rd bin is symmetric already on hepdata, while the legacy implementation seems to have symmetrised plus and minus uncertainties in the past here. This also affects the central value.

General comment: small overall differences are due to formatting. For example: 1.2999999999999998 instead of 1.3. Is there any function that was used in the past to uniform the way floats were formatted?

Radonirinaunimi · 2024-10-21T11:03:34Z

Thanks for this @jacoterh!

Small overall differences are due to formatting. For example: 1.2999999999999998 instead of 1.3. Is there any function that was used in the past to uniform the way floats were formatted?

Now, you can use this function:

nnpdf/nnpdf_data/nnpdf_data/filter_utils/utils.py

Lines 402 to 423 in bf8c890

    
           def prettify_float(dumper, value): 
        
               """ 
        
               Override the default yaml representer: 
        
               https://github.com/yaml/pyyaml/blob/48838a3c768e3d1bcab44197d800145cfd0719d6/lib/yaml/representer.py#L189 
        
               This function is used to prettify the float representation in the yaml file. 
        
               If the float has more than 8 digits, it will be represented in scientific notation with 8 digits. 
        
               Note: 
        
               ----- 
        
               When importing yaml in a module, 
        
               yaml.add_representer(float, prettify_float) 
        
               must be called to use this function. 
        
               """ 
        
               ret = dumper.represent_float(value) 
        
               if len(ret.value) > 8: 
        
                   ret_str = f"{value:.8e}" 
        
                   ret = dumper.represent_scalar('tag:yaml.org,2002:float', ret_str) 
        
               return ret

and call it as follows in the filter:

nnpdf/nnpdf_data/nnpdf_data/commondata/ATLAS_1JET_8TEV_R06/filter.py

Line 7 in bf8c890

yaml.add_representer(float, prettify_float)

Also, same as in the other PRs, could you check that the datasets are loaded properly and that the (t0) covmats are the same?

from validphys.api import API
import numpy as np

inp1 = {"dataset_input": {"dataset": f"{new_implementation}"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}
inp2 = {"dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}

covmat1 = API.covmat_from_systematics(**inp1)
covmat2 = API.covmat_from_systematics(**inp2)

t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

np.all(np.isclose(covmat1, covmat2))
np.all(np.isclose(t0_covmat1, t0_covmat2))

scarlehoff · 2024-10-22T12:47:36Z

nnpdf_data/nnpdf_data/commondata/CDF_Z0_1P96TEV/metadata.yaml

-  url: ''
-  version: -1
+  url: https://www.hepdata.net/record/ins856131?version=1&table=Table%202
+  version: 1
 implemented_observables:
 - observable_name: ZRAP
  observable:
    description: Drell-Yan Rapidity Distribution
    label: CDF $Z$ rapidity (new)
    units: ''
  process_type: EWK_RAP


Hi @jacoterh

Remember to put as process_type: one of the processes that you will find in this file: https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py (at the bottom, either DY_Z_Y or DY_W_ETA)

and also put kinematic_override: identity.

This is to avoid going through some old code that we are trying to remove.

When you do that you will see that that file defines which kinematic variables are allowed, sometimes it is necessary to change from Q -> Q^2 or pT -> pT^2 (the kinematic info in the end is the same, but we are trying to standardize it a bit more)

jacoterh · 2024-10-22T12:52:01Z

Thanks @Radonirinaunimi! The numbers have been formatted correctly now. I have also performed the check you proposed using the snippet you provided, see for instance here . If you run this script, you'll find that the 3rd bin in D0_Z0_1P96TEV differs from the legacy implementation for the reason I mentioned in the beginning of this PR.

Some questions:

Why do the tests suddenly fail? Running pytest tests/test_datafiles.py locally works fine.
I explicitly checked whether the comparison against the legacy implementation failed when I randomly changed some numbers. It did for all datasets, except for the 1st bin in D0_WPWM_1P96TEV. The test does neem to care about whatever I write here. This doesn't happen for the D0_Z0_1P96TEV and CDF_Z0_1P96TEV datasets.
In the metadata, which hepdata table do I link for the url in case multiple entered the implementation?

scarlehoff · 2024-10-22T13:05:36Z

The current failure is because it is plotting::kinematics_override (the probability that I get a variable name or word in general correct without looking in the first try is close to 0)

The previous one probably related to some change in digits looking at the logs.

jacoterh · 2024-11-08T15:01:46Z

validphys reports:

observables = ["CDF_Z0_1P96TEV_ZRAP", "D0_WPWM_1P96TEV_ASY", "D0_Z0_1P96TEV_ZRAP"]

for obs in observables:

    new_implementation = obs
    old_implementation = obs

    inp1 = {
        "dataset_input": {"dataset": f"{new_implementation}"},
        "theoryid": 200,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }
    inp2 = {
        "dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"},
        "theoryid": 200,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }

    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)

    t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
    t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

    print(f"Comparison for {new_implementation}")
    print(np.all(np.isclose(covmat1, covmat2)))
    print(np.all(np.isclose(t0_covmat1, t0_covmat2)))

>>> Comparison for CDF_Z0_1P96TEV_ZRAP
True
True
Comparison for D0_WPWM_1P96TEV_ASY
True
True
Comparison for D0_Z0_1P96TEV_ZRAP
False
False

The last disagreement in D0_Z0_1P96TEV_ZRAP can be explained as follows. The systematic uncertainty in the 3rd bin appears symmetric already on hepdata, while the legacy implementation seems to have symmetrised plus and minus uncertainties in the past here. This also affects the central value.

scarlehoff · 2024-11-09T18:59:55Z

Thanks Jaco. One last thing, could you check that also the experimental covmat for the 3 datasets together doesn't change for non-diagonal entries?

(you can also e.g. just check that the experimental chi2 for all 3 considered together is unchanged)

(this will be important for the singletop data as well, to check that possible experimental correlations across datasets are also captured)

jacoterh · 2024-11-11T09:48:32Z

Sure, I made a plot of the ratio against the legacy implementation. There are no cross correlations between datasets here though (which I find odd as we have two datasets from D0), so the covmat is block diagonal meaning that the datasets can be cross checked separately. But I will also do this check for single top, where we do have cross correlations, thanks!

For reference, the ratio was obtained with

dsinps = [
        {'dataset': 'CDF_Z0_1P96TEV_ZRAP'},
        {'dataset': 'D0_WPWM_1P96TEV_ASY'},
        {'dataset': 'D0_Z0_1P96TEV_ZRAP'}
]

dsinps_legacy = [
        {'dataset': 'CDF_Z0_1P96TEV_ZRAP', 'variant': 'legacy'},
        {'dataset': 'D0_WPWM_1P96TEV_ASY', 'variant': 'legacy'},
        {'dataset': 'D0_Z0_1P96TEV_ZRAP', 'variant': 'legacy'}
]

inp = dict(dataset_inputs=dsinps, theoryid=200, use_cuts="internal")
inp_legacy = dict(dataset_inputs=dsinps_legacy, theoryid=200, use_cuts="internal")
cov = API.dataset_inputs_covmat_from_systematics(**inp)
cov_legacy = API.dataset_inputs_covmat_from_systematics(**inp_legacy)

scarlehoff

Hi @jacoterh

Thanks for this. All seems fine, the only thing, could you bump the version and change the version_comment in the metadata.yml files?

RE the failing test, you might need to regenerate pseudodata_test_fit_n3fit_240916. Since it is only the pseudodata that is relevant, you can just do:

vp-fitrename pseudodata_test_fit_n3fit_240916 pseudodata_test_fit_n3fit_241112
cp pseudodata_test_fit_n3fit_241112/filter.yaml  pseudodata_test_fit_n3fit_241112.yaml

And then just regenerate the data for the 10 replicas, no need to wait until the fit is done.

jacoterh · 2024-11-15T14:50:04Z

@scarlehoff I have tried to follow your steps but I am a bit lost. This is what I did.

vp-get fit pseudodata_test_fit_n3fit_240916 
vp-fitrename pseudodata_test_fit_n3fit_240916 pseudodata_test_fit_n3fit_241115
cp pseudodata_test_fit_n3fit_241115/filter.yml pseudodata_test_fit_n3fit_241115.yml

and launched the fit with

n3fit pseudodata_test_fit_n3fit_241115.yml 10

This gives me a fitted replica, but then what do I commit to make sure the tests don't fail anymore?

scarlehoff · 2024-11-15T15:29:30Z

Upload the new fit to the server (so that it is available) amd in conftest change the name of the fit. If you then run pytest for that test it should work.

jacoterh · 2024-11-15T17:03:37Z

Thanks, I ran the fit on 20 replicas, evolved them, but now postfit filters out 90% of them:
[INFO]: 2 replicas pass IntegNumber_3.
Should I just fit more replicas, i.e. is this expected? 90% seems a bit harsh.

And if it's just the pseudo data that we need, we don't even care right?

scarlehoff · 2024-11-15T22:18:30Z

Yes, you don't even need to finish the fit. That's why if you use fit rename you get the previous one and just take need the new data csvs

scarlehoff · 2024-11-22T20:41:37Z

Let me know whether I can merge this.

nnpdf_data/nnpdf_data/commondata/D0_WPWM_1P96TEV/metadata.yaml

nnpdf_data/nnpdf_data/commondata/D0_Z0_1P96TEV/metadata.yaml

nnpdf_data/nnpdf_data/commondata/CDF_Z0_1P96TEV/metadata.yaml

nnpdf_data/nnpdf_data/commondata/D0_WPWM_1P96TEV/metadata.yaml

nnpdf_data/nnpdf_data/commondata/D0_Z0_1P96TEV/metadata.yaml

jacoterh · 2024-11-25T12:14:50Z

For future reference, this is the (hopefully) final validphys comparison: https://vp.nnpdf.science/PceSQWJaSLCrCp7wHzrsWg==/

If the tests pass, this branch should be ready for merging

scarlehoff reviewed Oct 22, 2024

View reviewed changes

scarlehoff added the data toolchain label Nov 1, 2024

scarlehoff mentioned this pull request Nov 1, 2024

Reimplement ATLAS_Z0_7TEV_49FB_HIMASS #2178

Merged

This was referenced Nov 11, 2024

Re-implementation of ATLAS single top #2189

Merged

Re-implementation of CMS single top #2200

Merged

scarlehoff reviewed Nov 12, 2024

View reviewed changes

scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Nov 12, 2024

scarlehoff marked this pull request as ready for review November 13, 2024 15:40

Re-implementing Tevatron inc DY in new commondata format

e0edfd8

jacoterh force-pushed the tevatron_inc_DY branch from 6fd5029 to e0edfd8 Compare November 25, 2024 11:15

scarlehoff reviewed Nov 25, 2024

View reviewed changes

nnpdf_data/nnpdf_data/commondata/D0_WPWM_1P96TEV/metadata.yaml Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/D0_Z0_1P96TEV/metadata.yaml Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/CDF_Z0_1P96TEV/metadata.yaml Outdated Show resolved Hide resolved

implementing last comments

bd140a9

scarlehoff reviewed Nov 25, 2024

View reviewed changes

nnpdf_data/nnpdf_data/commondata/CDF_Z0_1P96TEV/metadata.yaml Outdated Show resolved Hide resolved

scarlehoff reviewed Nov 25, 2024

View reviewed changes

nnpdf_data/nnpdf_data/commondata/D0_WPWM_1P96TEV/metadata.yaml Outdated Show resolved Hide resolved

scarlehoff reviewed Nov 25, 2024

View reviewed changes

nnpdf_data/nnpdf_data/commondata/D0_Z0_1P96TEV/metadata.yaml Outdated Show resolved Hide resolved

updating version

c9808b9

Merge branch 'master' into tevatron_inc_DY

05b77d3

scarlehoff approved these changes Nov 25, 2024

View reviewed changes

scarlehoff merged commit 676e70f into master Nov 25, 2024
6 checks passed

scarlehoff deleted the tevatron_inc_DY branch November 25, 2024 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tevatron inclusive DY #2185

Tevatron inclusive DY #2185

jacoterh commented Oct 21, 2024 •

edited

Loading

Radonirinaunimi commented Oct 21, 2024

scarlehoff Oct 22, 2024

jacoterh Oct 22, 2024

jacoterh commented Oct 22, 2024

scarlehoff commented Oct 22, 2024

jacoterh commented Nov 8, 2024 •

edited

Loading

scarlehoff commented Nov 9, 2024

jacoterh commented Nov 11, 2024 •

edited

Loading

scarlehoff left a comment •

edited

Loading

jacoterh commented Nov 15, 2024

scarlehoff commented Nov 15, 2024

jacoterh commented Nov 15, 2024 •

edited

Loading

scarlehoff commented Nov 15, 2024

scarlehoff commented Nov 22, 2024

jacoterh commented Nov 25, 2024

Tevatron inclusive DY #2185

Tevatron inclusive DY #2185

Conversation

jacoterh commented Oct 21, 2024 • edited Loading

Radonirinaunimi commented Oct 21, 2024

scarlehoff Oct 22, 2024

Choose a reason for hiding this comment

jacoterh Oct 22, 2024

Choose a reason for hiding this comment

jacoterh commented Oct 22, 2024

scarlehoff commented Oct 22, 2024

jacoterh commented Nov 8, 2024 • edited Loading

scarlehoff commented Nov 9, 2024

jacoterh commented Nov 11, 2024 • edited Loading

scarlehoff left a comment • edited Loading

Choose a reason for hiding this comment

jacoterh commented Nov 15, 2024

scarlehoff commented Nov 15, 2024

jacoterh commented Nov 15, 2024 • edited Loading

scarlehoff commented Nov 15, 2024

scarlehoff commented Nov 22, 2024

jacoterh commented Nov 25, 2024

jacoterh commented Oct 21, 2024 •

edited

Loading

jacoterh commented Nov 8, 2024 •

edited

Loading

jacoterh commented Nov 11, 2024 •

edited

Loading

scarlehoff left a comment •

edited

Loading

jacoterh commented Nov 15, 2024 •

edited

Loading