-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding data keyword to validphys #651
Conversation
validphys2/src/validphys/filters.py
Outdated
"""Filter closure test data.""" | ||
total_data_points = 0 | ||
total_cut_data_points = 0 | ||
fakeset = fakepdfset.load() | ||
# Load experiments | ||
for experiment in experiments: | ||
for dataset in data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be slightly changed. We want more like:
loaded_data = data.load.__wrapped__(experiment) # I assume this bypasses the lru_cache?
loaded_data.MakeClosure(fakeset, fakenoise)
for dataset in data.datasets:
path = filter_path / dataset.name
nfull, ncut = _write_ds_cut_data(path, dataset)
total_data_points += nfull
total_cut_data_points += ncut
loaded_ds = loaded_exp.GetSet(j)
if errorsize != 1.0:
loaded_ds.RescaleErrors(errorsize)
loaded_ds.Export(str(path))
because we need the level 1 shift to account for correlated systematics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes it bypasses the cache https://docs.python.org/3/library/functools.html#functools.lru_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, I thought you had to put it back inside an experiment again to get it to work
validphys2/src/validphys/core.py
Outdated
# keep backwards compatibility | ||
old_experiments_input = d.get("experiments") | ||
if old_experiments_input: | ||
d["data"] = data_from_experiment(old_experiments_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to change this so it adds data_input
to the runcard not data
ok so I think we have a minimum working example including vp-setupfit. Provided there are no issues with this structure I guess we need to then address all the places in validphys which use these functions |
@wilsonmr @RosalynLP ok so if I ve understood correctly now we should
Correct? |
Yes that's what I thought, perhaps we can divvy up who's doing what tomorrow? |
@wilsonmr have a look if we are doing the right thing..
|
validphys2/src/validphys/config.py
Outdated
try: | ||
_, data_input = self.parse_from_( | ||
None, 'data_input', write=False) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to except something specific here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the idea was to keep backward compatibility with old runcards using produce_matched_datasets_from_dataspecs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By except something specific what do you mean exactly sorry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so except the exact error you get when you should be parsing experiment.
This catches all exceptions which could be bad if it raises an IO error or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's probably a ConfigError
but I'm not sure
I think it looks good so far |
…n Config when dataset_inputs not found
validphys2/src/validphys/config.py
Outdated
with self.set_context(ns=self._curr_ns.new_child({'theoryid':thid})): | ||
_, experiments = self.parse_from_('fit', 'experiments', write=False) | ||
_, data_input = self.parse_from_('fit', 'dataset_inputs', write=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I don't think we need to set context anymore, since dataset_inputs doesn't require theoryid :)
ok so we now have the stuff required to get results.py moving. In particular we can do the following: def group_results(data, ...):
"""Like results but for a group of data"""
...
# requires user to specify `metadata_group_key` in runcard
metadata_groups_results = collect(
"group_results",
("group_data_by_metadata",)
)
# `metadata_group_key` is set to `experiments` by the production rule
fits_experiments_results = collect(
"metadata_groups_results",
("fits", "fitcontext", "groupby_experiment")
) which works for both new fits and old fits and also for new runcards and old runcards. The idea is to change all functions which take an experiment to take data, so by default they act on all of the data and then to use the grouping production rule/rules to instead make them act on subsets of data the grouping function we tried to act on the simplest objects possible which are dataset_inputs, and so don't explicitly require theoryid or cuts |
Seems like the right direction. |
Right I tested that and it gives the desired output. |
I tried to do a rebase and totally messed it up though |
Sometimes it's easier with |
Omg this is a gem, never came across this before thanks |
@RosalynLP could you please fix the conflict? |
@scarrazza conflicts fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can now be merged
Should we tag this merge commit? |
Good point. I added a tag. Now version 3.5 has the data keyword merged and
3.4 does not.
…On Thu, Jul 16, 2020 at 2:10 PM siranipour ***@***.***> wrote:
Should we tag this merge commit?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#651 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLJWUS2CZZJW5TQCAAKWJLR3334RANCNFSM4KWTPK2A>
.
|
I actually had a dream about merging this PR the other day so I'm happy about this 🎉. Let's hope it doesn't break too much 🙃 |
Adding the data keyword to validphys, supersedes #515
the idea here is discussed at length in PR #515 but the idea is to have a flat input of datasets rather than introduce experiment hierarchy at the level of the runcard - instead arbitrary groupings can be done with meta data or whatever
defaults will be filled in - waiting on lockfiles for this
we think that the current implementation is fully backwards compatible - all actions will now depend on data and all fits which do not have a
data
key are given one in as_input from their experiment specification. Then just need to be careful to make sure that action dependencies are all sorted out correctlyTo do: