Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading for rust #97

Open
4 tasks
alecandido opened this issue Feb 23, 2022 · 7 comments
Open
4 tasks

Lazy loading for rust #97

alecandido opened this issue Feb 23, 2022 · 7 comments
Labels
output Output format and management rust Rust extension related
Milestone

Comments

@alecandido
Copy link
Member

alecandido commented Feb 23, 2022

We'd like PineAPPL as well to be able to consume an EKO one Q2 at a time, since evolving grids requesting a huge number of Q2 points would require splitting the grid otherwise.

In order to do this, the easiest is to directly provide and maintain a Rust crate that is able to manage the EKO output format.
Indeed, PyO3 would make it possible to run Python code from Rust, but it is more a handle on the interpreter itself, rather than proper bindings in this direction (even though it might be used for this, but it can be painful).

So, the easiest thing it's to make a standalone crate in this repository (unfortunately eko is already lost as a name, but we can still use something like get eko, i.e. geko).
In order to do this, we don't really need many ingredients:

  • a tar library, for which the one linked is an obvious candidate
  • a yaml library, that is another easy choice
  • an npy library, and here it becomes difficult; the alternatives are
    1. npyz not much maintained, but should be complete enough
    2. ndarray-npy, slightly more maintained in the recent future, but it does not seem to have a complete feature set
    3. npy, that is mentioned by the previous, and the number 1. claims to be its fork, but it looks untouched from 2018

#242 loader

In light of the release of the new runner, and the associated new internal format, I'd speed up the implementation of a first version of the loader.

First iteration (strictly required):

Second iteration (nice-to-have):

  • load runcards
@alecandido alecandido added the rust Rust extension related label Feb 23, 2022
@felixhekhorn
Copy link
Contributor

as you already said: i. or ii. seem to be likely candidates ...

@alecandido
Copy link
Member Author

alecandido commented Feb 24, 2022

Yes, but unfortunately no one of them is very much maintained, so we have to be ready to fix them ourselves (if needed).

We might want to have a look to the code, .npy is not a very complicated format, it is just the binary dump of an array + some headers. In any case, the crates are a better starting point than writing everything from scratch, if we're lucky they will completely fulfill our purpose.

@felixhekhorn
Copy link
Contributor

maybe we should provide the possibility to do the xgrid_reshape on get-time

@felixhekhorn
Copy link
Contributor

@alecandido can you please remind me what you meant by "operators headers syncer" above?

@alecandido
Copy link
Member Author

alecandido commented Aug 22, 2024

It's more than two years ago, but I guess it was:

eko/src/eko/io/inventory.py

Lines 186 to 203 in 3bfc89d

def sync(self):
"""Sync the headers in the cache with the content on disk.
In particular, headers on disk that are missing in the :attr:`cache`
are added to it, without loading actual operators in memory.
Despite the name, the operation is non-destructive, so, even if cache
has been abused, nothing will be deleted nor unloaded.
"""
for path in self.path.iterdir():
if path.suffix != HEADER_EXT:
continue
header = self.header_type(
**yaml.safe_load(path.read_text(encoding="utf-8"))
)
self.cache[header] = None

and friends (e.g. __iter__)

The idea was that the inventory is partly in memory, and you can change it in memory without dumping on disk. But, every now and then, you want to persist to disk (even right before losing what you've done so far).

@felixhekhorn
Copy link
Contributor

ok I see - so for the moment I'm not keeping a copy in memory in Rust: i.e. there is no Inventory::cache, just Inventory::load() (which, I understood, passes the ownership and thus the responsibility to the calller) - and now I wonder if we really need such a thing ... do we ever read the same 4d operator twice? and if ever can't we rely on the user to keep it alive? I suggest to delay a cache (and any related features) until we have a solid prove we actually need it (which will make the implementation for now much easier)

@alecandido
Copy link
Member Author

Well, I guess your main concern is not much the cache implementation, which could be kept in Python, or wherever is needed, but the write operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output Output format and management rust Rust extension related
Projects
None yet
Development

No branches or pull requests

2 participants