Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend fill grid python interface #107

Merged
merged 6 commits into from
Feb 11, 2022
Merged

Extend fill grid python interface #107

merged 6 commits into from
Feb 11, 2022

Conversation

alecandido
Copy link
Member

@alecandido alecandido commented Jan 29, 2022

This should close #81

  • docs
  • test

I'm not sure that getting as input a 2 dimensional numpy array is the best idea ever:
https://github.com/N3PDF/pineappl/blob/cf7a8828e73d77a8e4b0c9a05a369617932e95d2/pineappl_py/src/grid.rs#L150-L156
but at the end of the day NTuple is a meaningful concept in pineappl, and it is more or less a named array of f64 elements, so it's not that bad.
The only other option would be to have 4 matching arrays x1s, x2s, q2s, weights:

  • advantage: things are named, no confusion about sorting
  • disadvantage: it's messier, and you want them to match, that is something intrinsic to a 2 dimensional array

@alecandido
Copy link
Member Author

I'm doing something ugly in fill_with_array:
https://github.com/N3PDF/pineappl/blob/cf7a8828e73d77a8e4b0c9a05a369617932e95d2/pineappl_py/src/grid.rs#L158

Instead I'd like to:

  • assert length is 4
  • destructure into the variables

if possible in a single statement.

@cschwan any idea?

@alecandido alecandido marked this pull request as draft January 29, 2022 10:27
@alecandido alecandido requested a review from cschwan January 29, 2022 10:27
@codecov
Copy link

codecov bot commented Jan 29, 2022

Codecov Report

Merging #107 (92f802c) into master (70c710a) will increase coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   88.43%   88.58%   +0.15%     
==========================================
  Files          31       31              
  Lines        3070     3111      +41     
==========================================
+ Hits         2715     2756      +41     
  Misses        355      355              
Impacted Files Coverage Δ
pineappl_cli/src/main.rs 100.00% <0.00%> (ø)
pineappl_cli/src/lumi.rs
pineappl_cli/src/obl.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70c710a...92f802c. Read the comment docs.

Copy link
Contributor

@cschwan cschwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zaharid
Copy link

Zaharid commented Jan 31, 2022

The interface looks ok to me. I think these would be the requisite primitives. ISTM it would be slightly simpler to pass separate arrays and typically faster since it would typically avoid having to construct a big array somewhere in Python.

@alecandido
Copy link
Member Author

Okay, so I'd say:

  1. the name was chosen at random, fill_with_array had the slight advantage of making explicit that I'm filling with an array, rather than filling an array, but I really don't care, so I'll put fill_array and we even preserve some compatibility
  2. I lacked context, and since observables have to be passed as an array as well (a 1-D sequence in general, that will be an array in practice) we completely loose the pineappl concept of N-tuple (in the sense that for each index you need more information than the tuple itself) and even @Zaharid's point about efficiency looks relevant to me, so we'll have 5 arrays instead
  3. about lumi and order I'd definitely go for separate calls: as you said, should not be too much of a overhead, and moreover, if you want to have a complicate setup of lumis, orders, and observables, you need to iterate over and flat the output in a 1-D array (while intrinsically it would be a higher dimensional one), and I see it as an unneeded complication, reducing transparency of the algorithm

In summary, I agree with all your points and I can take care of implementing them, it should take little.
Then I'll simply lack some simple tests before merging.

@alecandido
Copy link
Member Author

Hi @cschwan, are you aware of any rust pattern to avoid repeated instances of calls like this:
https://github.com/N3PDF/pineappl/blob/dd11579c979efba764a9181997190d7e07ee34b5/pineappl_py/src/grid.rs#L171-L177
I found this post about the topic, but it seems like it's not currently possible to implement my idea of:

  • make an array of arguments
  • apply a common transformation with a map call
  • (if needed) make it into tuple
  • unpack into arguments

@cschwan
Copy link
Contributor

cschwan commented Feb 4, 2022

@alecandido I believe that's the correct way to do it. I'm not aware of any 'better' way.

@alecandido
Copy link
Member Author

😞

@Zaharid
Copy link

Zaharid commented Feb 4, 2022

I suppose one could look into the izip macro and extend it, but doesn't seem worth it.

@alecandido
Copy link
Member Author

I suppose one could look into the izip macro and extend it, but doesn't seem worth it.

This for sure...

@alecandido alecandido marked this pull request as ready for review February 4, 2022 21:33
@alecandido
Copy link
Member Author

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

At the moment, we have a very small API for subgrids (I already pointed out this somewhere else, while I was investigating a grid in the interpreter), and, in particular, we can only set values, but never read.
Of course, there is the extra complication that there exists multiple kinds of subgrids...

Sooner or later (but not too late, since it's useful for debugging), I'd add methods like iter and stats, but definitely in another PR. I'm ready to merge.

@cschwan
Copy link
Contributor

cschwan commented Feb 5, 2022

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

At the moment, we have a very small API for subgrids (I already pointed out this somewhere else, while I was investigating a grid in the interpreter), and, in particular, we can only set values, but never read. Of course, there is the extra complication that there exists multiple kinds of subgrids...

Sooner or later (but not too late, since it's useful for debugging), I'd add methods like iter and stats, but definitely in another PR. I'm ready to merge.

Subgrid::iter() would be the proper way to directly 'read' a subgrid apart from Subgrid::convolute().

@cschwan
Copy link
Contributor

cschwan commented Feb 5, 2022

I added very basic tests, but unfortunately I can't do more without implementing a wider interface.

You could copy the example in https://github.com/N3PDF/pineappl/blob/master/examples/python/dyaa.py and make it a test!

* master:
  Add numpyfication in python layer, support generic sequences
  Update docs to numpy arrays usage
  Fix numpy arrays related tests
  Port bin to numpy arrays
  Port fktable to numpy arrays
  Port import subgrid to numpy arrays
  Correct `remap` help a bit
  Support native pathlike objects
  Convert lists to numpy arrays in grid interface
@cschwan
Copy link
Contributor

cschwan commented Feb 10, 2022

@alecandido do you think this is ready for merging? If yes, please go ahead!

@alecandido
Copy link
Member Author

alecandido commented Feb 10, 2022

Not yet, it's actually missing the test: I dismissed the change request because it was related to fill_array, and so outdated, but I want to implement the test first.

P.S.: scheduled today

@cschwan
Copy link
Contributor

cschwan commented Feb 10, 2022

P.S.: scheduled today

Great 👍! This is the last Issue for v0.5.0, after having it fixed we'll release a new Rust version, a new Python pip version, and now I'm having a look at the conda package: https://github.com/conda-forge/pineappl-feedstock. That'd mean we can announce it tomorrow and let people test it. @Zaharid might be interested in it.

Will NNPDF/runcards, N3PDF/pineko, NNPDF/fktables be affected by the Python changes?

@alecandido
Copy link
Member Author

Will NNPDF/runcards, N3PDF/pineko, NNPDF/fktables be affected by the Python changes?

This PR is extending the API, and #106 is backward compatible at the end. So maybe #107 is the only one that might break something.

However, runcards is managed by poetry, so versions are pinned, the moment I'll try to update I simply have to test everything is still working. The other two @felixhekhorn uses daily, so we'll find out soon :)

@alecandido
Copy link
Member Author

alecandido commented Feb 10, 2022

I raise a white flag:

  • in 92f802c I'm actually testing the result of the convolution
  • unfortunately, everything I put results in a nice 0. at the end (actually a couple of, since self.fake_grid() generates a grid with two bins)

Here I need a bit more of domain knowledge to put some meaningful number, and test something different from 0.

So, if you want to play a bit with numbers @cschwan do it, otherwise we can even merge (a test with 0 is not that meaningful, but at least we're testing the grid is not corrupted and can be convoluted...).

@cschwan
Copy link
Contributor

cschwan commented Feb 11, 2022

@alecandido thanks again, I think that's sufficient for the time being, everything else we can do in Issue #108.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Python interface for Grid::fill* methods
3 participants