-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding data keyword to validphys #651
Changes from 135 commits
24c526f
fd45465
b539fbf
bd27d61
94c9b8a
75e4267
a3ad2fd
f233a5f
ff5683b
83b2f6b
989107c
95ce685
c9c1638
9f32645
47503be
d738478
a96fc54
b0a05a8
747f12e
6f8fe01
40e221b
387f5f3
04fe3ca
c904402
fc475c0
fbf71eb
6e372e0
87e7bfc
17f2152
d7a5d95
ad6bc4f
75716ca
ebedb6e
9138960
87336cd
f12f77b
8677db0
daa724a
a330546
c6aebad
4152d92
2e9e291
24686a7
298c1f1
a384527
ddfc598
9d3cc53
2969ed1
c0d2268
4003ff3
305e95e
d998712
284baa7
dc7217d
35c1186
24459a3
f6b6b47
f5a9d63
c922a30
d8aa451
d9325ac
0d66584
e426df2
3bbc266
764a1e0
8bad539
354a30a
e47ec6c
19f728b
66d8c72
b2683d7
bdf12f7
2be9ce5
1a5cb15
bf41492
c09dfa4
a79adc4
ae6e954
6cc8451
9d820a1
84fce39
bacfbc6
6acf084
a97bfc1
95ffd31
2f38340
3575d1d
5e6a033
5d98e25
af57e0e
4ea0d19
f252512
3111c7a
0c036b1
821ad0f
1979388
804d711
7080b42
74bb291
0ac397d
6356269
16e9235
24c57da
8c055b1
f45b73d
d0a0911
a7192c5
0072b07
0317df5
7732f36
cbece0a
2797cfe
355cdec
8b99510
e0cfd44
ac7c515
5432516
ade8b40
f7f1b30
d955e5d
01160fb
6726c4a
ab7ee14
9c9dfa5
24d8efd
f3b0fce
13e7c9b
0f35367
3a10cff
fa0ab2e
81d332e
ba32b7b
6322e2e
efb0b08
6b0a552
2e256ad
a3642b1
334e1fd
4aeadab
accc6ed
618c53c
094e292
5033e01
64719ca
06c8aa5
41348b4
67c83a3
fda7ae6
75b4404
3ed3bca
773fc28
1b33526
5dbb9e1
a697270
018cdd5
50a5b7d
43d6fa1
298027a
43047ac
a4e84a0
7a0b057
49216a5
aab4f8a
9cf580f
4b9fd2f
6e6c2a3
5c7fc55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# Data Specification | ||
|
||
## Specifying a dataset | ||
|
||
In a validphys runcard a single data is specified using a `dataset_input`. This | ||
is a dictionary which minimally specifies the name of the dataset, but can also | ||
control behaviour such as contributions to the covariance matrix for the dataset | ||
and NNLO cfactors. | ||
|
||
here is an example dataset input | ||
|
||
```yaml | ||
dataset_input: | ||
dataset: CMSZDIFF12 | ||
cfac: [QCD,NRM] | ||
sys: 10 | ||
``` | ||
|
||
This particular example is for `CMSZDIFF12` dataset, the user has specified to | ||
use some cfactors `cfac` and `sys: 10` which correponds to an additonal | ||
contribution to the covariance matrix accounting for statistical fluctuations in | ||
the cfactors. These settings correspond to NNLO predictions amd so presumably | ||
elsewhere in the runcard the user would have specified a NNLO theory - such as | ||
theory 53. | ||
|
||
Clearly there is a big margin for error when manually entering `dataset_input` | ||
and so there is a [project](https://github.com/NNPDF/nnpdf/issues/226) which aims to have a stable way of filling many of | ||
these settings with correct default values. | ||
|
||
## Specifying Multiple datasets | ||
|
||
Multiple datasets are specified using `dataset_inputs` key: a list where | ||
each element of the list is a valid `dataset_input`. For example: | ||
|
||
```yaml | ||
dataset_inputs: | ||
- { dataset: NMC } | ||
- { dataset: ATLASTTBARTOT, cfac: [QCD] } | ||
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 } | ||
``` | ||
|
||
We see that multiple datasets are inputted as a flat list and there is no | ||
hierarchy to the datasets, splitting them into experiments or process types. | ||
The grouping of datasets is done internally according to the metadata of | ||
datasets and is controlled by `metadata_group` key. This can be any key which | ||
is present in the `PLOTTING` file of each dataset - for example `experiment` or | ||
`nnpdf31_process`. | ||
|
||
If `metadata_group` is not specified in the runcard then it takes on the default | ||
value according to `data_grouping`. By default `data_grouping` is set to | ||
`standard_report` which corresponds `metadata_group` defaulting to `experiment`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This paragraph is confusing: |
||
|
||
For example the following runcard produces a single column table with a row containing | ||
the chi2 of the specificed datasets, grouped by `experiment` | ||
(the default grouping when nothing is specified). | ||
|
||
```yaml | ||
dataset_inputs: | ||
- { dataset: NMC } | ||
- { dataset: ATLASTTBARTOT, cfac: [QCD] } | ||
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 } | ||
|
||
theoryid: 53 | ||
|
||
dataspecs: | ||
- pdf: NNPDF31_nnlo_as_0118 | ||
|
||
use_cuts: internal | ||
|
||
actions_: | ||
- dataspecs_groups_chi2_table | ||
``` | ||
|
||
If we add to the runcard to choose a different grouping: | ||
|
||
```yaml | ||
metadata_group: nnpdf31_process | ||
|
||
dataset_inputs: | ||
- { dataset: NMC } | ||
- { dataset: ATLASTTBARTOT, cfac: [QCD] } | ||
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 } | ||
|
||
theoryid: 53 | ||
|
||
dataspecs: | ||
- pdf: NNPDF31_nnlo_as_0118 | ||
|
||
use_cuts: internal | ||
|
||
actions_: | ||
- dataspecs_groups_chi2_table | ||
``` | ||
|
||
then we instead get a single column table, but with the datasets grouped by | ||
process type, according the [theory uncertainties paper](https://arxiv.org/abs/1906.10698). | ||
|
||
## Backwards compatibility | ||
|
||
Most old validphys runcards which used the `experiments` key to specify a | ||
multi-levelled list of datasets should still work within the new framework. This | ||
is because if `dataset_inputs` is not present in the runcard, `validphys` | ||
attempts to find an `experiments` key and infer `dataset_inputs` from it. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,5 +13,6 @@ vp-guide | |
./upload.md | ||
./nnprofile.md | ||
./scripts.md | ||
./dataspecification.md | ||
./theorycov/index | ||
./pydataobjs.rst |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,8 @@ | |
'validphys.theorycovariance.tests', | ||
'validphys.replica_selector', | ||
'validphys.closuretest', | ||
# currently broken - will fix in NNPDF/nnpdf#511 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually I can just remove this because the module doesn't even exist There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this ready to go now? |
||
# 'validphys.closure', | ||
'validphys.mc_gen_checks', | ||
'validphys.theoryinfo', | ||
'validphys.pseudodata', | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should describe how the covariance matrices are constructed, or link to a document that does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also could use a minimal introduction "what can you do with a dataset".