Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving pseudodata of replicas whit multiple fits #2138

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

achiefa
Copy link
Contributor

@achiefa achiefa commented Jul 31, 2024

When using parallel models, pseudodata are not saved for each replica, resulting in the following error when running multiple fits with n3fit.

Cannot request that multiple replicas are fitted and that pseudodata is saved. Either set `fitting::savepseudodata` to `false` or fit replicas one at a time.

This branch ensures that pseudodata for each replica are saved, even when parallel_models=true is set.

@achiefa achiefa added the n3fit Issues and PRs related to n3fit label Jul 31, 2024
@achiefa achiefa self-assigned this Jul 31, 2024
@achiefa achiefa marked this pull request as draft July 31, 2024 09:23
@scarlehoff
Copy link
Member

Do you actually need this? We disabled this for multiple replicas due to issues with reproducibility.

And the (main) issue is not that easy to solve, namely that in parallel replicas datasets with a single point must all enter training or validation. So if you generate the data in parallel then you cannot reproduce it with the vp functions to do so.

If you need it we need to find a way to tag the data as having been generated in parallel.

@RoyStegeman
Copy link
Member

Thanks for starting this. I actually meant doing something along the lines of
replicas_training_pseudodata = collect("training_pseudodata", ("replicas",)) in n3fit_data.py.

I suspect this will save the pseudodata all in the same folder since somehow the output folder is probably the nnfit/replica_1 folder, using the table_folder from N3FitEnvironment but perhaps there is a way to solve this.

@RoyStegeman
Copy link
Member

And the (main) issue is not that easy to solve, namely that in parallel replicas datasets with a single point must all enter training or validation. So if you generate the data in parallel then you cannot reproduce it with the vp functions to do so.

This is easy to solve by always including datasets with a single point in the training data.

But indeed, we do need it because otherwise doing parameter determinations with CRM will be sketchy and with TCM will be impossible using GPU.

@scarlehoff
Copy link
Member

This is easy to solve by always including datasets with a single point in the training data.

I would not do that. I'd rather go for the tag.

In any case, before continuing, could you check that the sequential and parallel fit (without taking into account the 1 point datasets) produce exactly the same pseudodata and trvl masks? (if that works we can think about the rest)

@RoyStegeman
Copy link
Member

I would not do that. I'd rather go for the tag.

Why? If we're serious about using GPU as well as CPU fits, then I'd say they should be as close as possible in behaviour, no? The choice of what to do with those datasets was a bit arbitrary and has been changed over time anyway.

@scarlehoff
Copy link
Member

scarlehoff commented Jul 31, 2024

Then we should go for an actual solution instead of changing the behaviour every time it becomes an inconvenience.

But first let's make sure that the rest works the same, then I'll take care of masking the 1 point datasets in the same way in both modes, eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
n3fit Issues and PRs related to n3fit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants