Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding data keyword to validphys #651

Merged
merged 167 commits into from
Jul 16, 2020
Merged
Show file tree
Hide file tree
Changes from 135 commits
Commits
Show all changes
167 commits
Select commit Hold shift + click to select a range
24c526f
add data keyword and backwards compatibility at the level of FitSpec
wilsonmr Feb 17, 2020
fd45465
Replacing instances of experiments in vp-setupfit
Feb 21, 2020
b539fbf
Creating ExperimentSpec to make closure replicas from
Feb 21, 2020
bd27d61
changing produce_data to take as input also experimemts
Feb 22, 2020
94c9b8a
closure data now correlates across datasets. Corrected backwards comp…
wilsonmr Feb 26, 2020
75e4267
produce_fitcontext
Mar 2, 2020
a3ad2fd
produce_fitinputcontext
Mar 2, 2020
f233a5f
produce_matched_datasets_from_dataspecs
Mar 2, 2020
ff5683b
changed data_input to just use dataset_inputs, handle errors better i…
wilsonmr Mar 3, 2020
83b2f6b
produce_fit_data_groupby_experiment
Mar 4, 2020
989107c
to do
Mar 4, 2020
95ce685
added basic grouping functionality and bridge between old and new dat…
wilsonmr Mar 4, 2020
c9c1638
renaming ExperimentSpec to DataGroupSpec
Mar 6, 2020
9f32645
changing naming conventions in dataplots.py
Mar 6, 2020
47503be
updated results to use data
wilsonmr Mar 9, 2020
d738478
updating vp-comparefit template
Mar 9, 2020
a96fc54
debugging names of functions
Mar 9, 2020
b0a05a8
some refactoring of collects to fit expected loop structure
wilsonmr Mar 9, 2020
747f12e
more refactoring of collects - will clean up commites later
wilsonmr Mar 11, 2020
6f8fe01
fix phi table
wilsonmr Mar 11, 2020
40e221b
group_result_table_no_table
Mar 12, 2020
387f5f3
new naming convention in kinematics.py
Mar 16, 2020
04fe3ca
Merge branch 'add-data-kw' of https://github.com/NNPDF/nnpdf into add…
Mar 16, 2020
c904402
fixing typo
Mar 18, 2020
fc475c0
working on test_plots
Mar 18, 2020
fbf71eb
test_plots and test_fitdata
Mar 18, 2020
6e372e0
other typo
Mar 18, 2020
87e7bfc
plot_xq2
Mar 18, 2020
17f2152
test_plots and typo
Mar 18, 2020
d7a5d95
rename experiments_covmat
Mar 23, 2020
ad6bc4f
group_result_table
Mar 23, 2020
75716ca
group_result_table_68cl
Mar 23, 2020
ebedb6e
experiment covmats -> group covmats
Mar 23, 2020
9138960
groups results.py
Mar 23, 2020
87336cd
groups_chi2_table
Mar 24, 2020
f12f77b
correct bug in groups_chi2_table
Mar 24, 2020
8677db0
groups_central_value_no_table
Mar 24, 2020
daa724a
groups_central_values
Mar 24, 2020
a330546
fits_chi2_data
Mar 24, 2020
c6aebad
fits_total_chi2_for_groups
Mar 24, 2020
4152d92
working on dataspecs objs
Mar 25, 2020
2e9e291
Merge branch 'add-data-kw' of https://github.com/NNPDF/nnpdf into add…
Mar 25, 2020
24686a7
working on compatibility between results.py and dataplots.py
Mar 25, 2020
298c1f1
removing dataset_index_by_process
Mar 25, 2020
a384527
merge
Mar 25, 2020
ddfc598
theorycov construction first round of changes - prior to restructure
Mar 25, 2020
9d3cc53
theorycov output first round of changes - prior to restructure
Mar 25, 2020
2969ed1
removing dataset_index_by_process from import
Mar 25, 2020
c0d2268
plot_dataset_inputs_phi_dist
Mar 25, 2020
4003ff3
Merge branch 'add-data-kw' of github.com:NNPDF/nnpdf into add-data-kw
Mar 25, 2020
305e95e
remove dataset_index_byprocess -> groups_index
Mar 25, 2020
d998712
bug in collect fn
Mar 26, 2020
284baa7
dataset_inputs -> data_input in collect
Mar 26, 2020
dc7217d
added default grouping if not specified
wilsonmr Mar 26, 2020
35c1186
fix behaviour when reading default from lockfile
wilsonmr Mar 26, 2020
24459a3
fix keys so there are no conflicts in lockfile
wilsonmr Mar 26, 2020
f6b6b47
propogate change through to production rule
wilsonmr Mar 26, 2020
f5a9d63
make the lockfile support for data grouping make more sense - have a …
wilsonmr Mar 30, 2020
c922a30
rebasing on master
Apr 1, 2020
d8aa451
rebase on master again
Apr 1, 2020
d9325ac
ExperimentSpec -> DataGroupSpec in loader.py
Apr 1, 2020
0d66584
made sure we are looping over datasets not dataset inputs
wilsonmr Apr 2, 2020
e426df2
fixing groups_normcovmat
Apr 3, 2020
3bbc266
Merge branch 'add-data-kw' of github.com:NNPDF/nnpdf into add-data-kw
Apr 3, 2020
764a1e0
fixed matrix_plot_labels to account for index changed to index with cuts
Apr 3, 2020
8bad539
add data keyword and backwards compatibility at the level of FitSpec
wilsonmr Feb 17, 2020
354a30a
Replacing instances of experiments in vp-setupfit
Feb 21, 2020
e47ec6c
Creating ExperimentSpec to make closure replicas from
Feb 21, 2020
19f728b
changing produce_data to take as input also experimemts
Feb 22, 2020
66d8c72
closure data now correlates across datasets. Corrected backwards comp…
wilsonmr Feb 26, 2020
b2683d7
produce_fitcontext
Mar 2, 2020
bdf12f7
produce_fitinputcontext
Mar 2, 2020
2be9ce5
produce_matched_datasets_from_dataspecs
Mar 2, 2020
1a5cb15
changed data_input to just use dataset_inputs, handle errors better i…
wilsonmr Mar 3, 2020
bf41492
produce_fit_data_groupby_experiment
Mar 4, 2020
c09dfa4
to do
Mar 4, 2020
a79adc4
added basic grouping functionality and bridge between old and new dat…
wilsonmr Mar 4, 2020
ae6e954
renaming ExperimentSpec to DataGroupSpec
Mar 6, 2020
6cc8451
changing naming conventions in dataplots.py
Mar 6, 2020
9d820a1
updated results to use data
wilsonmr Mar 9, 2020
84fce39
updating vp-comparefit template
Mar 9, 2020
bacfbc6
debugging names of functions
Mar 9, 2020
6acf084
some refactoring of collects to fit expected loop structure
wilsonmr Mar 9, 2020
a97bfc1
more refactoring of collects - will clean up commites later
wilsonmr Mar 11, 2020
95ffd31
fix phi table
wilsonmr Mar 11, 2020
2f38340
group_result_table_no_table
Mar 12, 2020
3575d1d
new naming convention in kinematics.py
Mar 16, 2020
5e6a033
fixing typo
Mar 18, 2020
5d98e25
working on test_plots
Mar 18, 2020
af57e0e
test_plots and test_fitdata
Mar 18, 2020
4ea0d19
other typo
Mar 18, 2020
f252512
plot_xq2
Mar 18, 2020
3111c7a
test_plots and typo
Mar 18, 2020
0c036b1
rename experiments_covmat
Mar 23, 2020
821ad0f
group_result_table
Mar 23, 2020
1979388
group_result_table_68cl
Mar 23, 2020
804d711
experiment covmats -> group covmats
Mar 23, 2020
7080b42
groups results.py
Mar 23, 2020
74bb291
groups_chi2_table
Mar 24, 2020
0ac397d
correct bug in groups_chi2_table
Mar 24, 2020
6356269
groups_central_value_no_table
Mar 24, 2020
16e9235
groups_central_values
Mar 24, 2020
24c57da
fits_chi2_data
Mar 24, 2020
8c055b1
fits_total_chi2_for_groups
Mar 24, 2020
f45b73d
working on dataspecs objs
Mar 25, 2020
d0a0911
working on compatibility between results.py and dataplots.py
Mar 25, 2020
a7192c5
removing dataset_index_by_process
Mar 25, 2020
0072b07
theorycov construction first round of changes - prior to restructure
Mar 25, 2020
0317df5
theorycov output first round of changes - prior to restructure
Mar 25, 2020
7732f36
removing dataset_index_by_process from import
Mar 25, 2020
cbece0a
plot_dataset_inputs_phi_dist
Mar 25, 2020
2797cfe
remove dataset_index_byprocess -> groups_index
Mar 25, 2020
355cdec
bug in collect fn
Mar 26, 2020
8b99510
dataset_inputs -> data_input in collect
Mar 26, 2020
e0cfd44
added default grouping if not specified
wilsonmr Mar 26, 2020
ac7c515
fix behaviour when reading default from lockfile
wilsonmr Mar 26, 2020
5432516
fix keys so there are no conflicts in lockfile
wilsonmr Mar 26, 2020
ade8b40
propogate change through to production rule
wilsonmr Mar 26, 2020
f7f1b30
make the lockfile support for data grouping make more sense - have a …
wilsonmr Mar 30, 2020
d955e5d
ExperimentSpec -> DataGroupSpec in loader.py
Apr 1, 2020
01160fb
made sure we are looping over datasets not dataset inputs
wilsonmr Apr 2, 2020
6726c4a
fixing groups_normcovmat
Apr 3, 2020
ab7ee14
fixed matrix_plot_labels to account for index changed to index with cuts
Apr 3, 2020
9c9dfa5
fix covmat regularization tests - needed to update tables to change m…
wilsonmr Apr 7, 2020
24d8efd
fix almost all test_regressions except one where action was deleted …
wilsonmr Apr 7, 2020
f3b0fce
fix test including pdf error in chi2
wilsonmr Apr 7, 2020
13e7c9b
updated tables for regression tests - tests passed when comparing jus…
wilsonmr Apr 7, 2020
0f35367
merge
Apr 8, 2020
3a10cff
changing collect function debug
Apr 8, 2020
fa0ab2e
add base documentation for data specification
wilsonmr Apr 8, 2020
81d332e
debug comparison plot
Apr 8, 2020
ba32b7b
Merge branch 'add-data-kw' of github.com:NNPDF/nnpdf into add-data-kw
Apr 8, 2020
6322e2e
fixing collect for each_dataset_results
Apr 8, 2020
efb0b08
theorycov tests module
Apr 20, 2020
6b0a552
fixing conflicts from master merge
Apr 27, 2020
2e256ad
add temporary ExperimentSpec because of cross imports with validphys …
wilsonmr May 1, 2020
a3642b1
add revert previous commit and just import datagroupspec in n3fit
wilsonmr May 1, 2020
334e1fd
updated documentation to explain more explicitly the core objects and…
wilsonmr May 5, 2020
4aeadab
removed redundant function and added warning when experiments is used
wilsonmr May 6, 2020
accc6ed
minor corections on documentation
wilsonmr May 8, 2020
618c53c
formatting documentation with better grammar
wilsonmr May 11, 2020
094e292
cleaned up config removing redundant code
wilsonmr May 11, 2020
5033e01
removing more redudancy from config
wilsonmr May 11, 2020
64719ca
Add spaces in construction.py
RosalynLP May 12, 2020
06c8aa5
Add space in output.py
RosalynLP May 12, 2020
41348b4
review changes - formatting docs and exposing final grouping in repor…
wilsonmr May 12, 2020
67c83a3
merge review changes - updating docs and exposing final grouping in t…
wilsonmr May 12, 2020
fda7ae6
updated fitinputcontext and comparefits reports to be agnostic on new…
wilsonmr May 12, 2020
75b4404
Rename groupres as group_res, fix some formatting and typos
voisey May 20, 2020
3ed3bca
explain naming conventions in docs and rename covariance_matrix accor…
wilsonmr May 26, 2020
773fc28
expand backwards compat section to be more correct and transparent
wilsonmr May 26, 2020
1b33526
make key explicit
wilsonmr May 26, 2020
5dbb9e1
backwards compatibility theorycov doc
RosalynLP May 26, 2020
a697270
Merge branch 'master' into add-data-kw
Jul 9, 2020
018cdd5
updating theorycov docs
May 27, 2020
50a5b7d
Edits of dataspecification docs
voisey May 27, 2020
43d6fa1
remove incorrect key in backwards compat example
wilsonmr May 27, 2020
298027a
Clarify usage of theory covariance matrix runcards in docs
voisey May 27, 2020
43047ac
Fix yaml block in data section of docs
voisey Jun 1, 2020
a4e84a0
Edit data docs
voisey Jun 1, 2020
7a0b057
Reformat data specification docs by wrapping at 80 characters
voisey Jun 1, 2020
49216a5
updating docs with essential changes
Jun 29, 2020
aab4f8a
allowing important experiments functions not to be deprecated
Jul 8, 2020
9cf580f
fixing syntax bug
Jul 8, 2020
4b9fd2f
allowing experiments actions for future use
Jul 8, 2020
6e6c2a3
Fixing tests
Jul 9, 2020
5c7fc55
Merge branch 'master' into add-data-kw
RosalynLP Jul 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions doc/sphinx/source/vp/dataspecification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Data Specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should describe how the covariance matrices are constructed, or link to a document that does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also could use a minimal introduction "what can you do with a dataset".


## Specifying a dataset

In a validphys runcard a single data is specified using a `dataset_input`. This
is a dictionary which minimally specifies the name of the dataset, but can also
control behaviour such as contributions to the covariance matrix for the dataset
and NNLO cfactors.

here is an example dataset input

```yaml
dataset_input:
dataset: CMSZDIFF12
cfac: [QCD,NRM]
sys: 10
```

This particular example is for `CMSZDIFF12` dataset, the user has specified to
use some cfactors `cfac` and `sys: 10` which correponds to an additonal
contribution to the covariance matrix accounting for statistical fluctuations in
the cfactors. These settings correspond to NNLO predictions amd so presumably
elsewhere in the runcard the user would have specified a NNLO theory - such as
theory 53.

Clearly there is a big margin for error when manually entering `dataset_input`
and so there is a [project](https://github.com/NNPDF/nnpdf/issues/226) which aims to have a stable way of filling many of
these settings with correct default values.

## Specifying Multiple datasets

Multiple datasets are specified using `dataset_inputs` key: a list where
each element of the list is a valid `dataset_input`. For example:

```yaml
dataset_inputs:
- { dataset: NMC }
- { dataset: ATLASTTBARTOT, cfac: [QCD] }
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 }
```

We see that multiple datasets are inputted as a flat list and there is no
hierarchy to the datasets, splitting them into experiments or process types.
The grouping of datasets is done internally according to the metadata of
datasets and is controlled by `metadata_group` key. This can be any key which
is present in the `PLOTTING` file of each dataset - for example `experiment` or
`nnpdf31_process`.

If `metadata_group` is not specified in the runcard then it takes on the default
value according to `data_grouping`. By default `data_grouping` is set to
`standard_report` which corresponds `metadata_group` defaulting to `experiment`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is confusing: data_grouping and standard_report have not been introduced previously.


For example the following runcard produces a single column table with a row containing
the chi2 of the specificed datasets, grouped by `experiment`
(the default grouping when nothing is specified).

```yaml
dataset_inputs:
- { dataset: NMC }
- { dataset: ATLASTTBARTOT, cfac: [QCD] }
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 }

theoryid: 53

dataspecs:
- pdf: NNPDF31_nnlo_as_0118

use_cuts: internal

actions_:
- dataspecs_groups_chi2_table
```

If we add to the runcard to choose a different grouping:

```yaml
metadata_group: nnpdf31_process

dataset_inputs:
- { dataset: NMC }
- { dataset: ATLASTTBARTOT, cfac: [QCD] }
- { dataset: CMSZDIFF12, cfac: [QCD,NRM], sys: 10 }

theoryid: 53

dataspecs:
- pdf: NNPDF31_nnlo_as_0118

use_cuts: internal

actions_:
- dataspecs_groups_chi2_table
```

then we instead get a single column table, but with the datasets grouped by
process type, according the [theory uncertainties paper](https://arxiv.org/abs/1906.10698).

## Backwards compatibility

Most old validphys runcards which used the `experiments` key to specify a
multi-levelled list of datasets should still work within the new framework. This
is because if `dataset_inputs` is not present in the runcard, `validphys`
attempts to find an `experiments` key and infer `dataset_inputs` from it.
1 change: 1 addition & 0 deletions doc/sphinx/source/vp/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,6 @@ vp-guide
./upload.md
./nnprofile.md
./scripts.md
./dataspecification.md
./theorycov/index
./pydataobjs.rst
2 changes: 2 additions & 0 deletions validphys2/src/validphys/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
'validphys.theorycovariance.tests',
'validphys.replica_selector',
'validphys.closuretest',
# currently broken - will fix in NNPDF/nnpdf#511
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I can just remove this because the module doesn't even exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready to go now?

# 'validphys.closure',
'validphys.mc_gen_checks',
'validphys.theoryinfo',
'validphys.pseudodata',
Expand Down
6 changes: 3 additions & 3 deletions validphys2/src/validphys/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ def check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat):


@make_argcheck
def check_experiment_cuts_match_theorycovmat(
experiment, fitthcovmat):
for dataset in experiment.datasets:
def check_data_cuts_match_theorycovmat(
data, fitthcovmat):
for dataset in data.datasets:
if fitthcovmat:
ds_index = fitthcovmat.load().index.get_level_values(1)
ncovmat = (ds_index == dataset.name).sum()
Expand Down
10 changes: 5 additions & 5 deletions validphys2/src/validphys/comparefittemplates/report.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ Training validation
-------------------
{@fits plot_training_validation@}

$\chi^2$ by experiment
$\chi^2$ by group
----------------------
{@plot_fits_experiments_chi2@}
{@plot_fits_groups_data_chi2@}
voisey marked this conversation as resolved.
Show resolved Hide resolved

$\chi^2$ by dataset comparisons
-------------------------------
Expand All @@ -53,11 +53,11 @@ $\chi^2$ by dataset comparisons
### Table
{@fits_chi2_table(show_total=true)@}

$\phi$ by experiment
$\phi$ by group
--------------------
{@plot_fits_experiments_phi@}
{@plot_fits_groups_data_phi@}

Experiment plots
Group plots
wilsonmr marked this conversation as resolved.
Show resolved Hide resolved
---------------
{@with matched_datasets_from_dataspecs@}
[Detailed plots for dataset ' {@dataset_name@} ']({@dataset_report report@})
Expand Down
Loading