Dump FK Table comparison #8

felixhekhorn · 2021-10-20T12:41:36Z

rename metadata results to results_grid
add metadata results_fk with comparison w/ against w/o evolution

will need NNPDF/pineappl#74 to read metadata

The text was updated successfully, but these errors were encountered:

cschwan · 2021-10-20T12:44:07Z

* rename metadata `results` to  `results_grid`

What's the idea here?

felixhekhorn · 2021-10-20T12:47:06Z

we can also just leave it - but then there will be results and results_fk so it's a bit inconsistent (since now there are two results ...)

alecandido · 2021-10-20T13:00:41Z

What's the idea here?

The point is that in rr we store the comparison at the grid level, that is the main and only comparison there, but when you come to pineko the new one is the FkTable.

For sure we don't want to change rr because there will pineappl at some point, since rr is upstream, and independent of what happens afterwards. Then it makes sense to change it afterwards.

cschwan · 2021-10-20T14:09:55Z

we can also just leave it - but then there will be results and results_fk so it's a bit inconsistent (since now there are two results ...)

I'd argue for that, because results are the true results (remember there are two entries MC and PineAPPL), and results_fk are the results with errors from the evolution. It also makes sense when you compare a grid at the process scale and an FK table; both results should be the same. And in the FK table there should be both, of course.

Moreover, results_fk should show a different table:

scale variations, which should be shown in results don't make any sense in results_fk
instead you should compare pre- and post-evolution results and just show the difference in per mille.

So the table should look as follows:

------------------------------------
    Pineko       PineAPPL   central
                             1/1000
------------------------------------
 7.218832e-01  7.218831e-01   0.0000
 7.142003e-01  7.142003e-01   0.0000
 7.928799e-01  7.928799e-01   0.0000
 7.850020e-01  7.850020e-01   0.0000
 7.728221e-01  7.728221e-01   0.0000
 8.654099e-01  8.654099e-01   0.0000

alecandido · 2021-10-20T14:55:12Z

Fine for me, I was thinking that in some sense results is the main focus, and it is a bit weird for the FkTable. Nevertheless, we can keep also as it is, it's not going to hurt anything.

felixhekhorn · 2021-10-20T15:04:22Z

So the table should look as follows:

and in fact, it does (quasi):

$ python -mpineko compare data/mydis.pineappl data/myfktable.mydis.pineappl.lz4  CT14llo_NF4
LHAPDF 6.3.0 loading /home/felix/local/share/LHAPDF/CT14llo_NF4/CT14llo_NF4_0000.dat
CT14llo_NF4 PDF set, member #0, version 1; LHAPDF ID = 13207
       O0 left    O0 right       O1 left      O1 right   PineAPPL    FkTable  percent_error
0    50.000000   50.000000  2.000000e-07  2.000000e-07  12.826575  12.826106      -0.003654
1    50.000000   50.000000  6.984209e-07  6.984209e-07   9.550602   9.549435      -0.012216
2    50.000000   50.000000  2.438943e-06  2.438943e-06   7.047279   7.046536      -0.010548
3    50.000000   50.000000  8.516807e-06  8.516807e-06   5.152585   5.152081      -0.009781
4    50.000000   50.000000  2.973850e-05  2.973850e-05   3.728779   3.728443      -0.008991
5    50.000000   50.000000  1.038117e-04  1.038117e-04   2.665475   2.665247      -0.008558
6    50.000000   50.000000  3.620545e-04  3.620545e-04   1.874623   1.874467      -0.008317
7    50.000000   50.000000  1.258680e-03  1.258680e-03   1.290291   1.290190      -0.007889
8    50.000000   50.000000  4.328501e-03  4.328501e-03   0.875369   0.875307      -0.007070
9    50.000000   50.000000  1.437507e-02  1.437507e-02   0.615639   0.615608      -0.005093
10   50.000000   50.000000  4.341492e-02  4.341492e-02   0.463775   0.463782       0.001509
11   50.000000   50.000000  1.091438e-01  1.091438e-01   0.361635   0.361633      -0.000531
12   50.000000   50.000000  2.195041e-01  2.195041e-01   0.288607   0.288606      -0.000214
13   50.000000   50.000000  3.668753e-01  3.668753e-01   0.195669   0.195669       0.000099
14   50.000000   50.000000  5.397572e-01  5.397572e-01   0.089400   0.089399      -0.001851
15   50.000000   50.000000  7.295868e-01  7.295868e-01   0.018904   0.018904       0.003388
16   50.000000   50.000000  9.309441e-01  9.309441e-01   0.000235   0.000236       0.515855
17   46.538363   46.538363  1.091438e-01  1.091438e-01   0.361349   0.361348      -0.000401
18   85.950595   85.950595  1.091438e-01  1.091438e-01   0.363406   0.363405      -0.000486
19  158.740105  158.740105  1.091438e-01  1.091438e-01   0.364716   0.364714      -0.000363

Note that the bin informations are also present here - something I also wish for results because otherwise I can't compare to the data from @scarlehoff who does apply cuts, and thus, e.g., presents only numbers for bin 6 onwards ...

cschwan · 2021-10-25T13:11:05Z

This looks good! However, I'd suggest to change a few details, namely

to print the differences in per mille instead of in per cent
instead of O0 left and O0 right you could print
- the actual observable names that are in the metadata,
- and print the observable names for both columns and leave out left and right

cschwan · 2021-10-25T13:12:52Z

@alecandido @felixhekhorn another completely unrelated question that also came up during the PC: in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

alecandido · 2021-10-25T13:39:49Z

the actual observable names that are in the metadata,

unfortunately to do this from python we still need to merge NNPDF/pineappl#74 (we do not have access in the python interface to metadata, but we are going to have)

@alecandido @felixhekhorn another completely unrelated question that also came up during the PC: in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

This I don't know, we should check commondata, with which I'm not very familiar... (yet)

cschwan · 2021-10-25T13:49:00Z

the actual observable names that are in the metadata,

unfortunately to do this from python we still need to merge N3PDF/pineappl#74 (we do not have access in the python interface to metadata, but we are going to have)

I see, I'll prioritize this Issue now.

felixhekhorn · 2021-10-25T15:26:40Z

in DIS the left and right bin boundaries are the same, but I wonder is this really the case also in the data? Don't we have bins in this case?

I don't think so - at least for the HERA fully inclusive stuff: https://www.hepdata.net/record/ins1377206?version=1&table=Table%202 (I didn't check the paper fully though ... and I know in practice this is rather required)

cschwan · 2021-12-28T20:07:00Z

Given what we've discussed last PC, I think we want the following information in the metadata that's not yet in there:

MC uncertainty
total error (difference between MC result and FK table result using the same PDF). This might require a standardized form for the results metadata and a new results_pdfset entry

alecandido · 2021-12-29T10:49:33Z

total error (difference between MC result and FK table result using the same PDF). This might require a standardized form for the results metadata and a new results_pdfset entry

If we're going to have a new standardized form results entry, I would make it in whatever standard format (even csv, with whatever sep) in order to make as easy as possible to machine parse it.
Now they are more optimized for human reading, but in case we need to read them frequently I would make a printer for it. In case it is more generally needed, I'd include the printer in pineappl_cli.

cschwan · 2021-12-29T11:11:33Z

@alecandido agreed, but I think we can have it both machine and human readable while only deviating a little from what we already do:

table headers get prefixed with "# " (comment character)
content left as is

The resulting table should be easily readable with numpy.loadtxt.

alecandido · 2022-03-01T10:31:16Z

@felixhekhorn if you can, try to address this as well during your pineko transition.

Since you're moving comparison from fkutils (if it were not already here), now the information should be available locally, so it's just a matter of dumping it in the metadata during convolution.

felixhekhorn · 2022-03-09T11:38:58Z

This looks good! However, I'd suggest to change a few details, namely

* to print the differences in per mille instead of in per cent

* instead of `O0 left` and `O0 right` you could print
  
  * the actual observable names that are in the metadata,
  * and print the observable names for both columns and leave out `left` and `right`

@cschwan here is what I currently have:

$ pineappl info --get results_fk data/fktables/213/LHCB_DY_13TEV_DIELECTRON.pineappl.lz4 
    yll left  yll right    PineAPPL     FkTable  permille_error
0      2.000      2.125   10.337989   10.330937       -0.682085
1      2.125      2.250   30.353310   30.334170       -0.630583
2      2.250      2.375   49.379146   49.355890       -0.470961
3      2.375      2.500   66.899600   66.879018       -0.307656
4      2.500      2.625   82.655479   82.635160       -0.245826
5      2.625      2.750   96.121822   96.097904       -0.248829
6      2.750      2.875  107.081167  107.053141       -0.261728
7      2.875      3.000  115.135111  115.104099       -0.269357
8      3.000      3.125  120.077337  120.044468       -0.273733
9      3.125      3.250  121.560013  121.526957       -0.271931
10     3.250      3.375  113.520790  113.489930       -0.271838
11     3.375      3.500   96.972783   96.946347       -0.272608
12     3.500      3.625   79.585250   79.563615       -0.271837
13     3.625      3.750   62.180007   62.162765       -0.277289
14     3.750      3.875   45.516016   45.502772       -0.290960
15     3.875      4.000   30.629132   30.619719       -0.307349
16     4.000      4.250   13.755959   13.751858       -0.298148

$ pineappl info --get results_fk_pdfset data/fktables/213/LHCB_DY_13TEV_DIELECTRON.pineappl.lz4 
NNPDF40_lo_as_01180

I can not drop "left/right" since the columns in data frames have to have unique names - alternatives are "min/max" (3 letters) or "l/r" (cut to 1 letter)

felixhekhorn · 2022-03-09T11:40:32Z

@alecandido agreed, but I think we can have it both machine and human readable while only deviating a little from what we already do:
* table headers get prefixed with "# " (comment character)

* content left as is
The resulting table should be easily readable with numpy.loadtxt.

adding "#" is not necessary, I think, as np.loadtxt explicitly exposes a skiprows argument

alecandido · 2022-03-09T11:44:17Z

Just to tell you: if your table is not so small, always consider using pd.read_csv, since it might be much faster than np.loadtxt (a couple of years ago it was, and I didn't expect anything to have changed in the meanwhile).

cschwan · 2022-03-09T12:19:23Z

adding "#" is not necessary, I think, as np.loadtxt explicitly exposes a skiprows argument

It's not necessary, but it would be nice to use the functions out-of-the-box; also note that we might consider changing the header at some point (the number of lines, add some other piece of information), or that we'd like to support other libraries. To have some flexibility I'd therefore strongly suggest to use a comment character. This should be mirrored by the results metadata field, of course.

To implement it you could set the header of the left-most column to # and left-align it.

felixhekhorn referenced this issue in NNPDF/pineappl Mar 9, 2022

Expose set_key_value to FK

10009e7

felixhekhorn linked a pull request Mar 24, 2022 that will close this issue

Move permanent part of fkutils here #12

Merged

1 task

scarlehoff closed this as completed in #12 Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump FK Table comparison #8

Dump FK Table comparison #8

felixhekhorn commented Oct 20, 2021

cschwan commented Oct 20, 2021

felixhekhorn commented Oct 20, 2021

alecandido commented Oct 20, 2021

cschwan commented Oct 20, 2021 •

edited

Loading

alecandido commented Oct 20, 2021

felixhekhorn commented Oct 20, 2021

cschwan commented Oct 25, 2021

cschwan commented Oct 25, 2021

alecandido commented Oct 25, 2021

cschwan commented Oct 25, 2021

felixhekhorn commented Oct 25, 2021

cschwan commented Dec 28, 2021

alecandido commented Dec 29, 2021

cschwan commented Dec 29, 2021

alecandido commented Mar 1, 2022

felixhekhorn commented Mar 9, 2022

felixhekhorn commented Mar 9, 2022

alecandido commented Mar 9, 2022

cschwan commented Mar 9, 2022 •

edited

Loading

Dump FK Table comparison #8

Dump FK Table comparison #8

Comments

felixhekhorn commented Oct 20, 2021

cschwan commented Oct 20, 2021

felixhekhorn commented Oct 20, 2021

alecandido commented Oct 20, 2021

cschwan commented Oct 20, 2021 • edited Loading

alecandido commented Oct 20, 2021

felixhekhorn commented Oct 20, 2021

cschwan commented Oct 25, 2021

cschwan commented Oct 25, 2021

alecandido commented Oct 25, 2021

cschwan commented Oct 25, 2021

felixhekhorn commented Oct 25, 2021

cschwan commented Dec 28, 2021

alecandido commented Dec 29, 2021

cschwan commented Dec 29, 2021

alecandido commented Mar 1, 2022

felixhekhorn commented Mar 9, 2022

felixhekhorn commented Mar 9, 2022

alecandido commented Mar 9, 2022

cschwan commented Mar 9, 2022 • edited Loading

cschwan commented Oct 20, 2021 •

edited

Loading

cschwan commented Mar 9, 2022 •

edited

Loading