Adding data keyword to validphys #651

wilsonmr · 2020-02-17T15:07:58Z

Adding the data keyword to validphys, supersedes #515

the idea here is discussed at length in PR #515 but the idea is to have a flat input of datasets rather than introduce experiment hierarchy at the level of the runcard - instead arbitrary groupings can be done with meta data or whatever

defaults will be filled in - waiting on lockfiles for this

we think that the current implementation is fully backwards compatible - all actions will now depend on data and all fits which do not have a data key are given one in as_input from their experiment specification. Then just need to be careful to make sure that action dependencies are all sorted out correctly

To do:

parsing data from runcard as some simple object (list of dataset_inputs)
produce data which is more complicated object - currently abusing ExperimentSpec
add backwards compatibility at the level of FitSpec
interface with lockfile/defaults when available so user can just input list of dataset names
sensible way to group data in various ways
check we didn't break anything

wilsonmr · 2020-02-26T13:42:13Z

validphys2/src/validphys/filters.py

    """Filter closure test data."""
    total_data_points = 0
    total_cut_data_points = 0
    fakeset = fakepdfset.load()
    # Load experiments
-    for experiment in experiments:
+    for dataset in data:


I think this needs to be slightly changed. We want more like:

loaded_data = data.load.__wrapped__(experiment) # I assume this bypasses the lru_cache? loaded_data.MakeClosure(fakeset, fakenoise) for dataset in data.datasets: path = filter_path / dataset.name nfull, ncut = _write_ds_cut_data(path, dataset) total_data_points += nfull total_cut_data_points += ncut loaded_ds = loaded_exp.GetSet(j) if errorsize != 1.0: loaded_ds.RescaleErrors(errorsize) loaded_ds.Export(str(path))

because we need the level 1 shift to account for correlated systematics

ah yes it bypasses the cache https://docs.python.org/3/library/functools.html#functools.lru_cache

Ah OK, I thought you had to put it back inside an experiment again to get it to work

wilsonmr · 2020-02-26T13:45:07Z

validphys2/src/validphys/core.py

+        # keep backwards compatibility
+        old_experiments_input = d.get("experiments")
+        if old_experiments_input:
+            d["data"] = data_from_experiment(old_experiments_input)


I need to change this so it adds data_input to the runcard not data

…at of old fits

wilsonmr · 2020-02-26T13:59:10Z

ok so I think we have a minimum working example including vp-setupfit. Provided there are no issues with this structure I guess we need to then address all the places in validphys which use these functions

tgiani · 2020-02-26T16:41:37Z

@wilsonmr @RosalynLP ok so if I ve understood correctly now we should

go through any vp2 action which takes experiments and change it so that it can take data_input as well
work on grouping datasets according to some chosen criterium, as started in Added tables for chi2 and phi by process #481

Correct?

RosalynLP · 2020-02-26T16:50:01Z

@wilsonmr @RosalynLP ok so if I ve understood correctly now we should

* go through any vp2 action which takes `experiments` and change it so that it can take `data_input` as well

* work on grouping datasets according to some chosen criterium, as started in #481

Correct?

Yes that's what I thought, perhaps we can divvy up who's doing what tomorrow?

tgiani · 2020-03-02T14:23:23Z

@wilsonmr have a look if we are doing the right thing..
@RosalynLP and I have done the following

produce_fitcontext and produce_fitinputcontext not much to be changed: we just changed experiments -> data_input and the previous change in core.py should do the rest
Regarding produce_matched_datasets_from_dataspecs the old runcard looks like
https://vp.nnpdf.science/24VZ1rh6QEm79AkcuGjZkA==/input/ and we want to

keep backward compatibility
put in the runcard the keyword data_input
change the tuple (datasets, dsinputs) to just dsinputs (we don't need datasets anymore/no longer exists in runcards)
change the labelling from (dataset name, experiment name, process) to (process, dataset name) because we don't want keyword experiments anymore

wilsonmr · 2020-03-02T15:20:40Z

validphys2/src/validphys/config.py

+                try:
+                    _, data_input = self.parse_from_(
+                    None, 'data_input', write=False)
+                except:


probably want to except something specific here

I guess the idea was to keep backward compatibility with old runcards using produce_matched_datasets_from_dataspecs

By except something specific what do you mean exactly sorry?

so except the exact error you get when you should be parsing experiment.

This catches all exceptions which could be bad if it raises an IO error or something

I guess it's probably a ConfigError but I'm not sure

wilsonmr · 2020-03-02T15:23:09Z

I think it looks good so far

…n Config when dataset_inputs not found

wilsonmr · 2020-03-03T15:28:24Z

validphys2/src/validphys/config.py

        with self.set_context(ns=self._curr_ns.new_child({'theoryid':thid})):
-            _, experiments = self.parse_from_('fit', 'experiments', write=False)
+            _, data_input = self.parse_from_('fit', 'dataset_inputs', write=False)


btw I don't think we need to set context anymore, since dataset_inputs doesn't require theoryid :)

…a specification

wilsonmr · 2020-03-04T17:30:20Z

ok so we now have the stuff required to get results.py moving. In particular we can do the following:

def group_results(data, ...):
    """Like results but for a group of data"""
    ...

# requires user to specify `metadata_group_key` in runcard
metadata_groups_results = collect(
    "group_results",
    ("group_data_by_metadata",)
)

# `metadata_group_key` is set to `experiments` by the production rule
fits_experiments_results = collect(
    "metadata_groups_results",
    ("fits", "fitcontext", "groupby_experiment")
)

which works for both new fits and old fits and also for new runcards and old runcards. The idea is to change all functions which take an experiment to take data, so by default they act on all of the data and then to use the grouping production rule/rules to instead make them act on subsets of data

the grouping function we tried to act on the simplest objects possible which are dataset_inputs, and so don't explicitly require theoryid or cuts

Zaharid · 2020-03-04T17:33:31Z

Seems like the right direction.

RosalynLP · 2020-07-09T10:59:14Z

Right I tested that and it gives the desired output.

RosalynLP · 2020-07-09T11:50:57Z

I tried to do a rebase and totally messed it up though

siranipour · 2020-07-09T11:59:49Z

Sometimes it's easier with --rebase-merges

RosalynLP · 2020-07-09T14:55:38Z

Sometimes it's easier with --rebase-merges

Omg this is a gem, never came across this before thanks

scarrazza · 2020-07-15T17:59:32Z

@RosalynLP could you please fix the conflict?

RosalynLP · 2020-07-16T09:39:27Z

@scarrazza conflicts fixed

voisey

I guess this can now be merged

siranipour · 2020-07-16T13:10:16Z

Should we tag this merge commit?

Zaharid · 2020-07-16T13:17:22Z

Good point. I added a tag. Now version 3.5 has the data keyword merged and 3.4 does not.

…

On Thu, Jul 16, 2020 at 2:10 PM siranipour ***@***.***> wrote: Should we tag this merge commit? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLJWUS2CZZJW5TQCAAKWJLR3334RANCNFSM4KWTPK2A> .

RosalynLP · 2020-07-16T13:26:54Z

I actually had a dream about merging this PR the other day so I'm happy about this 🎉. Let's hope it doesn't break too much 🙃

add data keyword and backwards compatibility at the level of FitSpec

24c526f

wilsonmr requested review from Zaharid and siranipour February 17, 2020 15:07

wilsonmr assigned RosalynLP, tgiani and wilsonmr Feb 17, 2020

wilsonmr mentioned this pull request Feb 17, 2020

[WIP] Data Keyword #515

Closed

RosalynLP and others added 3 commits February 21, 2020 16:50

Replacing instances of experiments in vp-setupfit

fd45465

Creating ExperimentSpec to make closure replicas from

b539fbf

changing produce_data to take as input also experimemts

bd27d61

wilsonmr commented Feb 26, 2020

View reviewed changes

closure data now correlates across datasets. Corrected backwards comp…

94c9b8a

…at of old fits

tgiani added 3 commits March 2, 2020 11:49

produce_fitcontext

75e4267

produce_fitinputcontext

a3ad2fd

produce_matched_datasets_from_dataspecs

f233a5f

wilsonmr commented Mar 2, 2020

View reviewed changes

changed data_input to just use dataset_inputs, handle errors better i…

ff5683b

…n Config when dataset_inputs not found

wilsonmr commented Mar 3, 2020

View reviewed changes

tgiani and others added 3 commits March 4, 2020 11:36

produce_fit_data_groupby_experiment

83b2f6b

to do

989107c

added basic grouping functionality and bridge between old and new dat…

95ce685

…a specification

wilsonmr mentioned this pull request Mar 6, 2020

Lockfiles NNPDF/reportengine#24

Merged

siranipour force-pushed the add-data-kw branch from 471b4c5 to f6911cc Compare July 9, 2020 11:56

Merge branch 'master' into add-data-kw

a697270

siranipour force-pushed the add-data-kw branch from 458b3ec to 6e6c2a3 Compare July 9, 2020 12:05

RosalynLP and others added 12 commits July 9, 2020 13:05

updating theorycov docs

018cdd5

Edits of dataspecification docs

50a5b7d

remove incorrect key in backwards compat example

43d6fa1

Clarify usage of theory covariance matrix runcards in docs

298027a

Fix yaml block in data section of docs

43047ac

Edit data docs

a4e84a0

Reformat data specification docs by wrapping at 80 characters

7a0b057

updating docs with essential changes

49216a5

allowing important experiments functions not to be deprecated

aab4f8a

fixing syntax bug

9cf580f

allowing experiments actions for future use

4b9fd2f

Fixing tests

6e6c2a3

Merge branch 'master' into add-data-kw

5c7fc55

RosalynLP dismissed stale reviews from siranipour and voisey via 5c7fc55 July 16, 2020 09:40

voisey approved these changes Jul 16, 2020

View reviewed changes

Zaharid merged commit 6130eb7 into master Jul 16, 2020

Zaharid deleted the add-data-kw branch July 16, 2020 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding data keyword to validphys #651

Adding data keyword to validphys #651

wilsonmr commented Feb 17, 2020

wilsonmr Feb 26, 2020

wilsonmr Feb 26, 2020

RosalynLP Feb 26, 2020

wilsonmr Feb 26, 2020

wilsonmr commented Feb 26, 2020

tgiani commented Feb 26, 2020

RosalynLP commented Feb 26, 2020

tgiani commented Mar 2, 2020

wilsonmr Mar 2, 2020

tgiani Mar 2, 2020

RosalynLP Mar 2, 2020

wilsonmr Mar 2, 2020

wilsonmr Mar 2, 2020

wilsonmr commented Mar 2, 2020

wilsonmr Mar 3, 2020

wilsonmr commented Mar 4, 2020 •

edited

Loading

Zaharid commented Mar 4, 2020

RosalynLP commented Jul 9, 2020

RosalynLP commented Jul 9, 2020

siranipour commented Jul 9, 2020 •

edited

Loading

RosalynLP commented Jul 9, 2020

scarrazza commented Jul 15, 2020

RosalynLP commented Jul 16, 2020

voisey left a comment

siranipour commented Jul 16, 2020

Zaharid commented Jul 16, 2020 via email

RosalynLP commented Jul 16, 2020

Adding data keyword to validphys #651

Adding data keyword to validphys #651

Conversation

wilsonmr commented Feb 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilsonmr commented Feb 26, 2020

tgiani commented Feb 26, 2020

RosalynLP commented Feb 26, 2020

tgiani commented Mar 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilsonmr commented Mar 2, 2020

Choose a reason for hiding this comment

wilsonmr commented Mar 4, 2020 • edited Loading

Zaharid commented Mar 4, 2020

RosalynLP commented Jul 9, 2020

RosalynLP commented Jul 9, 2020

siranipour commented Jul 9, 2020 • edited Loading

RosalynLP commented Jul 9, 2020

scarrazza commented Jul 15, 2020

RosalynLP commented Jul 16, 2020

voisey left a comment

Choose a reason for hiding this comment

siranipour commented Jul 16, 2020

Zaharid commented Jul 16, 2020 via email

RosalynLP commented Jul 16, 2020

wilsonmr commented Mar 4, 2020 •

edited

Loading

siranipour commented Jul 9, 2020 •

edited

Loading