Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite Output #105

Merged
merged 152 commits into from
Oct 10, 2022
Merged

Composite Output #105

merged 152 commits into from
Oct 10, 2022

Conversation

alecandido
Copy link
Member

@alecandido alecandido commented Mar 13, 2022

This should close #96.

  • split the 5-dim array into 4-dim ones in the operators folder

In this PR I'd like to provide even a CLI in ekobox:

  • kickstart eko CLI

Further:

  • add product subcommand (log operations into history.yaml and runcards to {n+1}.theory.yaml and {n+1}.operator.yaml) -> postponed to Improve EKOs product #107
  • autoload operators on __getitem__
  • unload (replace with None) and dump operators on __delitem__ (if already None, do nothing)
  • call __delitem__ for each operator on EKO.close, and call this last in EKO.__del__, for auto unloading
    • notice that

      It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

      so when the user program ends, EKO.close has to be called explicitly

  • make an iterator for EKO that loads operator one by one, and when loading the next one will dump the former (useful for applications: manipulation, consumption)
    • pay attention to dump even the last element before closing iteration
    • make a generator, yield elements one by one, raise StopIteration manually at the end
  • while manipulating, instead of skipping unloaded operators, load them one by one and manipulate (rely on the former)
  • context manager for item retrieval, with automated drop
  • provide docstrings (at the very least for the content of struct.py)
  • try to remove PathLike (in favor of the native os.PathLike)
  • implement deep copy (since it has to copy even the disk part, it has to request an explicit path eko.deepcopy(path))
  • fix Composite Output #105 (comment)
  • fix Composite Output #105 (comment) and Composite Output #105 (comment)
  • create compatibility layer Composite Output #105 (comment)

@alecandido alecandido linked an issue Mar 13, 2022 that may be closed by this pull request
8 tasks
@codecov
Copy link

codecov bot commented Mar 13, 2022

Codecov Report

Merging #105 (47e30e5) into develop (bb84999) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           develop      #105      +/-   ##
============================================
+ Coverage    99.97%   100.00%   +0.02%     
============================================
  Files           78        97      +19     
  Lines         3721      4510     +789     
============================================
+ Hits          3720      4510     +790     
+ Misses           1         0       -1     
Flag Coverage Δ
isobench 54.41% <55.55%> (?)
unittests 100.00% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/eko/member.py 100.00% <ø> (ø)
src/ekobox/genpdf/flavors.py 100.00% <ø> (ø)
src/eko/compatibility.py 100.00% <100.00%> (ø)
src/eko/couplings.py 100.00% <100.00%> (ø)
src/eko/evolution_operator/__init__.py 100.00% <100.00%> (+0.56%) ⬆️
src/eko/evolution_operator/grid.py 100.00% <100.00%> (ø)
src/eko/interpolation.py 100.00% <100.00%> (ø)
src/eko/output/__init__.py 100.00% <100.00%> (ø)
src/eko/output/legacy.py 100.00% <100.00%> (ø)
src/eko/output/manipulate.py 100.00% <100.00%> (ø)
... and 29 more

@alecandido
Copy link
Member Author

alecandido commented Mar 17, 2022

In this PR I definitely found out two things:

  1. eko is too coupled: changes in output module and runcards format have propagated everywhere
    • a good way to reduce coupling would be to avoid passing around runcards to the from_dict method of everything, and maybe just to kill those methods: not only runner, but potentially everything is aware of the structure of the input runcards (or part of it)
    • we can think on more ways to decouple
  2. Test suite is definitely useful, even for refactoring: I changed a lot (and also tests had to change a bit), but I'm pretty sure everything is still working
    • we're not running runner-based benchmarks in the CI, so they are potentially broken
    • we're lacking tests for ekobox and ekomark, so they are potentially broken

@giacomomagni perhaps the moment you finish with #83 can you have a look at the benchmarks? If they can run with poe bench-run (i.e. with pytest benchmarks -m 'not isolated') it takes me nothing to cook the workflow.
@felixhekhorn can you help me to provide a few tests for ekobox sooner or later? I'd like to check it's still working before merging.

@felixhekhorn
Copy link
Contributor

1. `eko` is too coupled: changes in `output` module and runcards format have propagated everywhere

well, the interpolation e.g. is only triggered deep inside - but if we can we should decouple

2. Test suite is definitely useful, even for refactoring: I changed a lot (and also tests had to change a bit), but I'm pretty sure everything is still working

we do run the isolated benchmark - which are failing at the moment ...

@felixhekhorn can you help me to provide a few tests for ekobox sooner or later? I'd like to check it's still working before merging.

you mean unit tests?

actually, should we do the same as in #95 and do a prepare-PR first before attempting the real thing?

@felixhekhorn felixhekhorn added enhancement New feature or request refactor Refactor code labels Mar 17, 2022
@alecandido
Copy link
Member Author

actually, should we do the same as in #95 and do a prepare-PR first before attempting the real thing?

This could have been a good idea, but at this point I almost have everything in place to sketch the actual output.
Maybe I can stop at output management: I make everything to create the folder and manage output, but I'll skip the

  • recipes runners
  • parts combiner
  • whole runner rework

you mean unit tests?

Yes, ekomark is not urgent (the benchmark part), but ekobox will be already used in evolve_n3fit, so we don't want to break it.

we do run the isolated benchmark - which are failing at the moment ...

This I know, and it's the next thing I'm going to fix (after all, I want the green tick back ;D). But as they are broken, if the runner-based ones will be (if nothing else, because of the runcards structure), but if there is no easy way to run them, it will take ages for me to recover.

@alecandido
Copy link
Member Author

alecandido commented Mar 17, 2022

@andreab1997 please help 🆘 🙏

I fixed all the old ekobox tests, but for the product.
While investigating it, I tried to fix it by simplifying (thanks to the new EKO class and related sugar). After the first few fixes it was just numerically failing the operator comparison, and after the heavy simplification is still numerically failing the same way...

Maybe you'll be able to quickly detect the error, if not I'll just try harder...

@alecandido
Copy link
Member Author

alecandido commented Mar 18, 2022

@andreab1997 yesterday evening I was too tired, it looks like...

Actually, the problem has been solved in 6af70f0: the product code (most likely after my update) was recomputing an operator that was already part of the "initial" one.

In practice, the example was the following:

eko1: 25 -> [60, 80, 100]
eko2: 100 -> [80, 100, 120]

eko_prod = eko1 @ eko2

so actually eko2 is quite special/degenerate, it contains:

  • a forward patch 100 -> 120
  • a backward patch 100 -> 80
  • a null patch (up to matching) 100 -> 100

When combined with eko1 generated the following:

eko_prod: 25 -> [60, 80, 100, 120, 80*, 100*]

The 80* is 25 -> 100 -> 80 (and similarly 100* = 25 -> 100 -> 100).
They were replacing the original objects in eko1, and most likely because of some numerical instability (or maybe some error), the comparison with the original one was failing.

@alecandido alecandido mentioned this pull request Mar 18, 2022
4 tasks
@alecandido
Copy link
Member Author

Since we should improve the product to guarantee that nothing weird will happen, I opened a dedicated issue #107.

This was referenced Mar 18, 2022
@alecandido
Copy link
Member Author

alecandido commented Apr 5, 2022

I left intentionally a couple of pdb statement, because I'm trying to bisect the current issue in "isolated benchmarks".

However, it looks like the problem is an actual one of current evolution: xgrid_reshape happens to be trivial but non-trivial (i.e. the targetgrid, xgrid, and inputgrid are all the same, but considered as different, I guess because of xgrid somehow), and manipulate.xgrid_reshape is flushing the correct content of the operator with 0, so at the final EKO is just a container of 0s, and the evolved PDF set is 0 as well.
That is clearly a bug, so I want to solve:

  • manipulate.xgrid_reshape, it should behave as expected
  • do not apply it, if targetgrid and inputgrid not explicitly specified

@alecandido
Copy link
Member Author

alecandido commented Apr 8, 2022

@felixhekhorn I removed the usage of unittest in e7be4bb for two reasons:

  1. it's true that unittest is standard library, nevertheless it is a further testing framework; since we're already using pytest, let's be as much consistent as possible and always use only that one (the thing it was tested by unittest, the mode in which the file was opened, it was unnecessarily detailed... if you wish we can improve current status by testing file existence after writing, that is the interesting thing for the end user)
  2. since it was mocking a lot, it was even breaking the usage of simple functions like tarfile.is_tarfile() (I'm using to test if the associated file is reliable, required for the new output, since it's half on-disk), I guess because being a stream it was not possible to reread it, but it would have been a nightmare to fix it, and just for the test

@felixhekhorn
Copy link
Contributor

@felixhekhorn I removed the usage of unittest in e7be4bb for two reasons:

fine

1. it's true that `unittest` is standard library, nevertheless it is a further testing framework; since we're already using `pytest`, let's be as much consistent as possible and always use only that one (the thing it was tested by `unittest`, the mode in which the file was opened, it was unnecessarily detailed... if you wish we can improve current status by testing file existence after writing, that is the interesting thing for the end user)

that we should do

@felixhekhorn
Copy link
Contributor

As I realized in NNPDF/pineappl#138 it is stupid to store the alpha_s values along side (since you don't know the renomalization scale of the process), so if you wish you can stop propagating that information here ...

@alecandido
Copy link
Member Author

As I realized in N3PDF/pineappl#138 it is stupid to store the alpha_s values along side (since you don't know the renomalization scale of the process), so if you wish you can stop propagating that information here ...

Changing things add layers over layers of refactoring. I'm just out of a few, I'd like to close this one without any other big change, if possible.

Not that I don't want to do it, but simply open a separate issue for it. I'll do the same with the from_dict extermination campaign.

@felixhekhorn
Copy link
Contributor

From now on, we should refrain to address multiple big tasks in a single PR

while I agree with that in theory, in practice this is much more difficult ... (as typically problems appear along the way)

@alecandido
Copy link
Member Author

while I agree with that in theory, in practice this is much more difficult ... (as typically problems appear along the way)

In the case of this PR this is actual refactoring, not addition, and refactoring the ground of such a big project it's rather expected to be messy, since it's breaking compatibility of everything (in case you didn't properly designed your internal interfaces...).

Hopefully, I'll be able to merge right after #83

@alecandido alecandido force-pushed the feature/composite-output branch from 692e3d1 to 7974a56 Compare May 11, 2022 08:52
@alecandido
Copy link
Member Author

alecandido commented May 11, 2022

Ok, I finally rebased this branch, I expect everything to break now.

In any case, it was already broken (as usual...), so now the plan is:

@felixhekhorn
Copy link
Contributor

The only remaining lines are the lovely dead code you promised to take care of ...

@felixhekhorn
Copy link
Contributor

while 44b581c brings LHA LO back running, I'm afraid we're hit by #123 again, because now I'm trying to run NLO and since 10 mins nothing is happening (except my computer taking 100% energy) and I believe he is compiling matching ... @giacomomagni can you confirm that you can run NLO (e.g. on master meaning in your FK table generation)? this PR should have no impact on compiling/matching ...

@felixhekhorn
Copy link
Contributor

  • I didn't recheck everything, but I'm relying on the tests, benchmarks, and I rerun the lha benchmark (eventually discovering another inconsistency)
  • since the output tutorial is now outdated we need to rewrite it
  • we also may want to delay the merge to develop until the resubmission of the paper which may trigger a 0.10.3

@felixhekhorn
Copy link
Contributor

  • since the output tutorial is now outdated we need to rewrite it

47e30e5 makes the tutorial back running with a minimal presentation

@felixhekhorn felixhekhorn merged commit 1c27efc into develop Oct 10, 2022
@felixhekhorn felixhekhorn deleted the feature/composite-output branch October 10, 2022 13:11
@alecandido alecandido mentioned this pull request Oct 12, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request output Output format and management refactor Refactor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory saving output
3 participants