Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action to return pseudodata in n3fit #650

Merged
merged 24 commits into from
Apr 22, 2020
Merged

Action to return pseudodata in n3fit #650

merged 24 commits into from
Apr 22, 2020

Conversation

siranipour
Copy link
Contributor

So far I've refactored seed generation. But the goal is to move towards #519 and close #639 along the way.

Also would it make more sense to have this action in vp or n3fit?

Very much a WIP I'll let you know when it's read for review

@siranipour siranipour added the n3fit Issues and PRs related to n3fit label Feb 13, 2020
@Zaharid
Copy link
Contributor

Zaharid commented Feb 13, 2020

There is the question of whether we want to make these things be proper vp actions. I have no strong feelings at the moment. Would need to sleep on it

@siranipour siranipour changed the title [WIP] Action to return pseudodata [WIP] Action to return pseudodata in n3fit Feb 13, 2020
@scarlehoff
Copy link
Member

There is the question of whether we want to make these things be proper vp actions. I have no strong feelings at the moment. Would need to sleep on it

Since vp knows about the replicas and about the seeds it might indeed make sense that what n3fit receives is already the list of seeds.

for replica_number in replica:
np.random.seed(trvalseed)
for i in range(replica_number):
trvalseed = np.random.randint(0, pow(2, 31))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should be using some sort of hmac instead of this old funny contortion @scarrazza @scarlehoff .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, we do so for the replicas I think (with emphasis on the some sort). That said, it is more complication than we need but if someone wants to have a go it can be a fun short PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that now is a good time on the grounds that we don't want to ever change the random number generation after we start using it for production.

@Zaharid
Copy link
Contributor

Zaharid commented Feb 14, 2020

This is an interesting read, and one that raises the question of whether we should be trading some sort of random state object instead:

https://numpy.org/doc/1.18/reference/random/index.html

@scarlehoff Could you please remind us what is the general status of reproducible random seeds in tensorflow.

btw this is somewhat connected to #77

@scarlehoff
Copy link
Member

Unchanged, if we were fully TF 2 the thing about the seeds should not be an issue but since not all system are still in TF 2.1 (or even TF 2.0) we are still using the compatibility library.

RE the random state, ok but not so sure how to implement it (other than using it to generate seeds ofc) because there are several things that we offload, such as the replica generation (libnnpdf) or the initialization of the NN (keras).

@siranipour siranipour force-pushed the n3fit_replicas branch 2 times, most recently from 9ac3c5e to 6970c94 Compare February 24, 2020 14:12
@siranipour
Copy link
Contributor Author

siranipour commented Feb 24, 2020

Ok as it stands, this function i think finds the correct pseudodata, I'm gonna work more on cleaning up the output

Also it is unbelievably slow, is this normal?

@siranipour siranipour force-pushed the n3fit_replicas branch 2 times, most recently from b5518c9 to b8f0387 Compare March 11, 2020 18:43
@siranipour
Copy link
Contributor Author

@Zaharid @scarlehoff can you please check b8f0387 is equivalent to the previous implementation? I'm working on making this function slightly faster, but I think it does what it's supposed to do

@scarlehoff
Copy link
Member

If the travis check passes it should be!

@siranipour siranipour force-pushed the n3fit_replicas branch 4 times, most recently from a776097 to ec7fc54 Compare March 25, 2020 14:58
@siranipour
Copy link
Contributor Author

Ok @wilsonmr the Linux test passed this time, looks like a mac issue

@siranipour
Copy link
Contributor Author

Gonna move this function to validphys.results unless someone has major objections

@siranipour
Copy link
Contributor Author

Behold parallel code in python. This now works on the order of minutes. @scarlehoff @Zaharid ready for review. LGTM :)

@siranipour siranipour changed the title [WIP] Action to return pseudodata in n3fit Action to return pseudodata in n3fit Mar 31, 2020
Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think upon adding new functionalities it would be good to add an example of usage (and maybe motivation and whatsnot) to the doocumentation, for instance linking the example from the docstring somewhere down https://docs.nnpdf.science/vp/api.html would be very nice so someone just has to go there and have a quick look (instead of having to swim through the automodules documentation).

with mp.Manager() as manager:
L = manager.list()

NPROC = mp.cpu_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to use the --parallel flag from validphys to choose whether to use multiprocessing.
In any case, using all cores as default without an easy way for turning that off might have unintended consequences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, I was thinking this too. Could get some angry emails from cluster admins. Does the --parallel flag apply here though? I thought that parallelized dependencies, in either case, I could have it as a function argument if you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, but if you have access to the flag (don't know, @Zaharid ?) you can use it to know whether you are allowed to run parallel o no.

(and yes, I was also thinking about angry cluster admins :__ )

@siranipour
Copy link
Contributor Author

Ok good shout on adding to docs. Will add this in the next commit.

P.s how difficult would it be to do something similar for old style NNFIT fits?

@siranipour
Copy link
Contributor Author

Ok @Zaharid @scarlehoff, I ran a quick 10 rep fit, of which only 1 survived postfit. I then pickle the all_exp_infos dictionary which we use as the test, and indeed this action returns the correct data as per the test_pseudodata.py. Let me know what you think.

@siranipour siranipour force-pushed the n3fit_replicas branch 3 times, most recently from 7fbb332 to 9c8e338 Compare April 20, 2020 09:21
@siranipour
Copy link
Contributor Author

I'm not too sure why the test is raising a FileNotFoundError when trying to load in the pickled file, it passes on my local machine when doing pytest --pyargs validphys.

Any ideas? Stack exchange said to use importlib.resources but even that hasn't fixed the problem.

@wilsonmr
Copy link
Contributor

if it passes locally but not when doing the build then probably it's not installing the file correctly. I think that because it's a .pickle file then you will need to add to

'':['*.template', '*.mplstyle', '*.csv', '*.yaml', '*.md', '*.png'],

another item for pickle files so that the installation installs the test file

@siranipour
Copy link
Contributor Author

siranipour commented Apr 20, 2020

Ahh yes of course! Thank Mikey! Hopefully it passes this time round

Edit: @wilsonmr actually, I guess I can add the pickle file to tests/reggressions or is that directory reserved for something in particular?

@siranipour siranipour force-pushed the n3fit_replicas branch 4 times, most recently from 4eab452 to ebaf0df Compare April 20, 2020 11:30
@scarrazza scarrazza merged commit c23377c into master Apr 22, 2020
@scarrazza scarrazza deleted the n3fit_replicas branch April 22, 2020 15:33
@wilsonmr wilsonmr mentioned this pull request Apr 28, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
n3fit Issues and PRs related to n3fit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reproducing pseudodata replicas in n3fit
6 participants