Eko from theory #1601

andreab1997 · 2022-09-16T09:07:16Z

We want to allow evolven3fit to look inside the relevant theory and, if possible, load the eko needed for the evolution from there. This should actually be the default behaviour.

n3fit/src/evolven3fit/evolve.py

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

validphys2/src/validphys/core.py

scarlehoff · 2022-09-16T14:35:20Z

@felixhekhorn any idea to get the memory consumption of the convolution down? I guess in this case is ok even if it is a bit slower.

Zaharid · 2022-09-16T14:46:22Z

I get the impression a lot of the things here should go in loader.py and be generally accessible from vp as providers.

felixhekhorn · 2022-09-16T14:47:00Z

@felixhekhorn any idea to get the memory consumption of the convolution down? I guess in this case is ok even if it is a bit slower.

This exact problem should be address here: NNPDF/eko#105 which allows to load a single evolution operator (for a fixed final scale) at a time - instead of loading all of them right at the beginning (as is done at the moment)

(Unfortunately @alecandido is quite busy at the moment so there is not much progress - and we want to delay this behind the final acceptance of the eko paper)

scarlehoff · 2022-09-16T14:53:22Z

I get the impression a lot of the things here should go in loader.py and be generally accessible from vp as providers.

We can have later on a validphys provider that provides the eko from the theory, but let's first focus on having an usable implementation. This is not less integrated in vp that any of the previous evolution scripts.

This exact problem should be address here: NNPDF/eko#105 which allows to load a single evolution operator (for a fixed final scale) at a time - instead of loading all of them right at the beginning (as is done at the moment)

I'm confused, how would this address the problem? Here we only have one initial scale but have many final scales so it is the opposite situation.

(not any other tricks that we can play? I believe this might only be going beyond the limit of 7GB by a bit so any reduction would help)

felixhekhorn · 2022-09-16T14:58:36Z

I'm confused, how would this address the problem? Here we only have one initial scale but have many final scales so it is the opposite situation.

I believe it is what you want: say you have two final scales 10 and 100

at the moment eko will load both dim-4 objects and keep them in memory and then do the algebra on the first and then on the second and then unload - effectively a dim-5 object is in memory all the time
while with the PR you first load one dim-4 object (say 10) do the algebra, unload the operator, load the second 4-dim object, do the algebra, unload the operator - so there is only a single dim-4 operator in memory at a time

scarlehoff · 2022-09-16T15:02:49Z

while with the PR you first load one dim-4 object (say 10) do the algebra, unload the operator, load the second 4-dim object, do the algebra, unload the operator - so there is only a single dim-4 operator in memory at a time

Then for the evolution from 1.65 to 1e5 with 50 steps of Q we have in memory all 50 operators at any given time?

felixhekhorn · 2022-09-16T15:07:57Z

while with the PR you first load one dim-4 object (say 10) do the algebra, unload the operator, load the second 4-dim object, do the algebra, unload the operator - so there is only a single dim-4 operator in memory at a time

Then for the evolution from 1.65 to 1e5 with 50 steps of Q we have in memory all 50 operators at any given time?

in current master, yes - after NNPDF/eko#105 becomes available, no

Zaharid · 2022-09-16T15:19:32Z

We can have later on a validphys provider that provides the eko from the theory, but let's first focus on having an usable implementation. This is not less integrated in vp that any of the previous evolution scripts.

I guess the difference is that here we are already using things like the loader, albeit in a cumbersome way. So since it is being worked on we could as well do it nicer.

scarlehoff · 2022-09-16T15:21:54Z

Implementing things inside validphys and using things from validphys are two different things.

In the particular case you mention (the loader, for the theory) is not even clear how it should be used (see egg and chicken problem in the other PR) since it would try to download a theory that it hasn't been built yet!

scarlehoff · 2022-09-19T06:42:40Z

@andreab1997 @felixhekhorn @alecandido I've noticed it does goes beyond 7 GB (getting all the way to 13 GB in my computer) but it is only for a few moments and it quickly goes back to 5GB where it stays.

Do you guys have any idea why this peak could be produced? Maybe there is a way to remove that peak easily.
Otherwise I'd say we have to put this on hold until that is fixed so it can be used.

alecandido · 2022-09-19T06:48:56Z

With yadism I knew: there was a point in which the grid was dumped to PineAPPL. But since the operation is performed by PineAPPLpy, the cross sections are first copied in memory in the new data structure, then dumped.
So there is a point at which they are doubled in memory, but right after the dump the Grid is dropped from memory, so it is just a spike.

I guess for EKO might happen something similar, most likely with YAML. It will be solved by the usual NNPDF/eko#105

scarlehoff · 2022-09-19T06:51:38Z

I guess for EKO might happen something similar, most likely with YAML. It will be solved by the usual NNPDF/eko#105

Ok, as soon as the computation of the operators for all theories finishes I'll try installing that branch to check whether the spike disappears. That branch is only waiting for the paper publication right?

alecandido · 2022-09-19T06:55:19Z

Ok, as soon as the computation of the operators for all theories finishes I'll try installing that branch to check whether the spike disappears. That branch is only waiting for the paper publication right?

Nope, it is definitely more complicated.
That branch is only the first step: it introduces the new data structure, massively changing everything, but it doesn't use it. So, the problem would be the exact same..

(The new data structure is self managing an EKO partially on disk, i.e. it can offload operators as soon as they are computed. But the computation part is still using the huge dictionary with all the operators in memory...)

alecandido · 2022-09-19T07:26:11Z

You can track full progress in NNPDF/eko#138

However, the moment we merge that one, we will write immediately the new "runner". Most likely it will take not too much, since most of the operations will be the exact same, we just need to save all computed pieces right after computation :)

scarlehoff · 2022-09-19T07:32:21Z

What do you mean by runner?
As far as I can see the spike is contained here https://github.com/N3PDF/eko/blob/01efee1d27d215e895dd055debff8905af3d548c/src/eko/output.py#L379

I wonder whether there's a way of rewriting this function so that it is a bit less heavy in memory. If not we just wait for NNPDF/eko#138

felixhekhorn · 2022-09-19T07:53:51Z

As far as I can see the spike is contained here https://github.com/N3PDF/eko/blob/01efee1d27d215e895dd055debff8905af3d548c/src/eko/output.py#L379

exactly - that's the very function we're trying to replace: and to be more precise we want to avoid for fp in innerdir.glob("*.npy.lz4"):

I wonder whether there's a way of rewriting this function so that it is a bit less heavy in memory. If not we just wait for N3PDF/eko#138

I'm afraid we need the full PR (and as you can see that one is highly non-trivial)

What do you mean by runner?

it's the place where the actual computation happens (meaning where everything is glued together). However, I guess in your case your concern is more about what is currently called evolve_pdfs (which calls at the moment load_tar) - is that correct? are you worried about computing the operator or computing the PDF with the operator?

alecandido · 2022-09-19T07:56:54Z

You are right, the spike is there because is (most likely) doubling the representation. But the idea is that you'd never worry about doubling, if the object you want to save is small enough in the first place.

The current "runner" (the object is running the full computation and storage) first computes all the operators, and then save them all at once.
The goal would be to replace this two steps process, breaking computation in pieces, i.e. "compute->store->compute->store->compute->...".

scarlehoff · 2022-09-19T08:01:05Z

it's the place where the actual computation happens (meaning where everything is glued together). However, I guess in your case your concern is more about what is currently called evolve_pdfs (which calls at the moment load_tar) - is that correct? are you worried about computing the operator or computing the PDF with the operator?

No, the spike seem to happen -at least in montblanc- during the direct call to load_tar

nnpdf/n3fit/src/evolven3fit/evolve.py

Line 95 in fd0e203

eko_op = output.Output.load_tar(theory_eko_path)

and it is finished after that.

alecandido · 2022-09-19T08:05:39Z

Ok, sorry. Then it's just the complementary problem.

But, actually, if you call load_tar a net increase in memory usage is expected: you are loading an object into memory...
Moreover, streams used for loading the operators should be dropped immediately by the garbage collector.

Can you try to explicitly erase the stream? (just modify manually the package in your environment)
Or also just try to profile the function...

scarlehoff · 2022-09-19T08:09:10Z

Moreover, streams used for loading the operators should be dropped immediately by the garbage collector.

They are, once the function is finished. It is really during the function lifetime that the spike occurs.

I'll play around with it once all ekos are done.

Of course, it might well be a giant red herring since I don't know what actually killed the CI, but I think there's a good chance it was this.

alecandido · 2022-09-19T08:13:05Z

They are, once the function is finished. It is really during the function lifetime that the spike occurs.

Nope, I mean they should be dropped during function execution: there is a single variable for all the streams, so before loading the new one, the former should be out of scope, and thus dropped.
Then, the actual time to be dropped at should be up to the Python gc, but I'd be really surprised if it is only doing it at the end of the function...

scarlehoff · 2022-09-21T13:10:16Z

While this cannot be used as the default evolution code for the PDF, I think it would still be good to have it

so I would merge this into the other branch (because it works) and then try to 1. keep the other as close to master as possible 2. maybe even change the name to evolven3fit_eko so that if someone wants to use it they can...

andreab1997 · 2022-09-21T13:41:39Z

While this cannot be used as the default evolution code for the PDF, I think it would still be good to have it

so I would merge this into the other branch (because it works) and then try to 1. keep the other as close to master as possible 2. maybe even change the name to evolven3fit_eko so that if someone wants to use it they can...

Ok I am merging this. Thank you!

First implementation

67b550d

andreab1997 changed the base branch from master to evolfit_w_eko September 16, 2022 09:09

andreab1997 requested a review from scarlehoff September 16, 2022 09:11

scarlehoff reviewed Sep 16, 2022

View reviewed changes

n3fit/src/evolven3fit/evolve.py Outdated Show resolved Hide resolved

scarlehoff reviewed Sep 16, 2022

View reviewed changes

n3fit/src/evolven3fit/evolve.py Outdated Show resolved Hide resolved

andreab1997 and others added 2 commits September 16, 2022 11:12

Remove pdb

4a4fc40

Update n3fit/src/evolven3fit/evolve.py

bff4be7

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

scarlehoff reviewed Sep 16, 2022

View reviewed changes

validphys2/src/validphys/core.py Outdated Show resolved Hide resolved

Remove function get_eko in core

fd0e203

scarlehoff added the run-fit-bot Starts fit bot from a PR. label Sep 16, 2022

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Sep 21, 2022

scarlehoff approved these changes Sep 21, 2022

View reviewed changes

andreab1997 merged commit 68cbe3f into evolfit_w_eko Sep 21, 2022

andreab1997 deleted the eko_from_theory branch September 21, 2022 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eko from theory #1601

Eko from theory #1601

andreab1997 commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

Zaharid commented Sep 16, 2022

felixhekhorn commented Sep 16, 2022

scarlehoff commented Sep 16, 2022 •

edited

Loading

felixhekhorn commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

felixhekhorn commented Sep 16, 2022

Zaharid commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

scarlehoff commented Sep 19, 2022

alecandido commented Sep 19, 2022

scarlehoff commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022 •

edited

Loading

scarlehoff commented Sep 19, 2022

felixhekhorn commented Sep 19, 2022

alecandido commented Sep 19, 2022

scarlehoff commented Sep 19, 2022

alecandido commented Sep 19, 2022 •

edited

Loading

scarlehoff commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022

scarlehoff commented Sep 21, 2022

andreab1997 commented Sep 21, 2022

Eko from theory #1601

Eko from theory #1601

Conversation

andreab1997 commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

Zaharid commented Sep 16, 2022

felixhekhorn commented Sep 16, 2022

scarlehoff commented Sep 16, 2022 • edited Loading

felixhekhorn commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

felixhekhorn commented Sep 16, 2022

Zaharid commented Sep 16, 2022

scarlehoff commented Sep 16, 2022

scarlehoff commented Sep 19, 2022

alecandido commented Sep 19, 2022

scarlehoff commented Sep 19, 2022 • edited Loading

alecandido commented Sep 19, 2022 • edited Loading

alecandido commented Sep 19, 2022 • edited Loading

scarlehoff commented Sep 19, 2022

felixhekhorn commented Sep 19, 2022

alecandido commented Sep 19, 2022

scarlehoff commented Sep 19, 2022

alecandido commented Sep 19, 2022 • edited Loading

scarlehoff commented Sep 19, 2022 • edited Loading

alecandido commented Sep 19, 2022

scarlehoff commented Sep 21, 2022

andreab1997 commented Sep 21, 2022

scarlehoff commented Sep 16, 2022 •

edited

Loading

scarlehoff commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022 •

edited

Loading

alecandido commented Sep 19, 2022 •

edited

Loading

scarlehoff commented Sep 19, 2022 •

edited

Loading