Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split anomalous dimensions into a Rust crate #185

Closed
alecandido opened this issue Dec 24, 2022 · 12 comments · Fixed by #189
Closed

Split anomalous dimensions into a Rust crate #185

alecandido opened this issue Dec 24, 2022 · 12 comments · Fixed by #189
Assignees
Labels
enhancement New feature or request rust Rust extension related
Milestone

Comments

@alecandido
Copy link
Member

The motivations for this issue are mainly:

  1. we are using inconsistent tooling in the theory pineline to achieve performant computations: here we use Numba, while PineAPPL is implemented in Rust for the same reason
  2. Numba has some limitations in terms of supported features, from this point of view Rust is much more flexible (it is a whole language with its own ecosystem, while Numba is bound to be a "subset" forever)
  3. Numba seems to be especially slow at compiling N3LO terms
  4. It is much more difficult to have bindings for anomalous dimensions for other languages, that might be relevant since we have original pieces

I'm currently trying to outline a suitable layout for the new Rust module, I'm doing some attempts in this repo:
https://github.com/AleCandido/atuin

I already asked for maturin authors support, but a minimal skeleton (no further crates for bindings) is almost working:
PyO3/maturin#1372

@alecandido alecandido self-assigned this Dec 24, 2022
@alecandido alecandido added enhancement New feature or request rust Rust extension related labels Dec 24, 2022
@alecandido
Copy link
Member Author

The template works, as soon as we close the current main refactoring (i.e. #172) I will provide a PR with a working layout, but almost nothing in.

The proposal is to move the numba decorated content to a standalone crate (let's say ekspressions), while provide bindings to it in a separate crate.

In principle, we can also move the integration on the Rust side (such that all the heavy lift will happen there, and we do not cross the Rust-Python boundary during intensive operations). This I would do in the bindings crate, or a separate one, keeping ekspressions (stupid name, suggest a better one) for the analytical expressions alone.

To integrate in Rust we can use bindings to the GSL: https://docs.rs/GSL/latest/rgsl/integration/index.html (i.e. essentially the same as SciPy, but we have to install GSL ourselves).
Still looking for better options, but this discourages me to attempt to do it in the short term...

@felixhekhorn
Copy link
Contributor

Compiling just took me about half an hour (!)

Matching: computing operators - 4/60 took: 1644.154465 s

I think we need to do something ...

@alecandido
Copy link
Member Author

I think we need to do something ...

Agreed, and the first step is already in #189. But it is lower priority than other businesses, in particular FONLL.

However, if @giacomomagni wants to start having a look we can discuss. But for me and you it is out of limits, until everything else is working smoothly.

@alecandido alecandido added this to the post milestone Jan 27, 2023
@alecandido
Copy link
Member Author

Rust supports incremental compilation (by default only for debug, opt-in for production, reasonably):
https://nnethercote.github.io/perf-book/compile-times.html
https://doc.rust-lang.org/cargo/reference/profiles.html#incremental

@felixhekhorn
Copy link
Contributor

@scarlehoff said in NNPDF/pineko#105 :

my computer crashed after eko took every possible resource,

and also discussing with @giacomomagni last week, we found the same conclusion: we should do something on the compilation front ...

@scarlehoff
Copy link
Member

To be more precise, my problem is that I have many cores but not so much memory so my scratch got completely filled (quite rapidly).

@alecandido
Copy link
Member Author

To be more precise, my problem is that I have many cores but not so much memory so my scratch got completely filled (quite rapidly).

You're not alone, it happened the same to me...

and also discussing with @giacomomagni last week, we found the same conclusion: we should do something on the compilation front ...

Yes, we should do something. But it won't be quick. The current workaround is to limit Numba processors for compilation, but it is taking a lot...

@scarlehoff
Copy link
Member

You're not alone, it happened the same to me...

And it happened again! This time while computing an eko for the evolution of a PDF. Not sure if it has gotten way worse lately (i.e., in one of the latest point releases) or whether I just didn't notice it.

@felixhekhorn
Copy link
Contributor

I think you should set the default to 1 (instead of 8)

Not sure if it has gotten way worse lately (i.e., in one of the latest point releases) or whether I just didn't notice it.

the strategy and settings should not have changed ... (not since 0.13)

@scarlehoff
Copy link
Member

scarlehoff commented Jul 11, 2023

It could've been that I just happen not to have used the relevant computers while they were in swap so I didn't notice.

8 seems to be safe (at least for now for me). I was going higher :P

@alecandido
Copy link
Member Author

And it happened again! This time while computing an eko for the evolution of a PDF. Not sure if it has gotten way worse lately (i.e., in one of the latest point releases) or whether I just didn't notice it.

This instead should be quick to solve.

An EKO has many dimensions, in particular Q2 and x (the one to Mellin invert), and we can parallelize:

  1. by dataset
  2. top-level within dataset, on Q2 (available after jets rework)
  3. bottom-level, in the quadrature integration (won't be available for Python, but coming soon in Rust)
  4. mid-level, doing more integrals at the same time

At the beginning, only 1. and 4. were available, and 4. was needed by Giacomo (who implemented it). Unfortunately, this parallelization is handled by Python, with an enormous overhead in memory (essentially, copying the involved objects once per thread spawned, since they are bound to new interpreters instances). Now that 2. is available, I would deprecate 4., setting the number of threads by default to a single one, and eventually drop to avoid similar issues again.

@giacomomagni
Copy link
Collaborator

I think you should set the default to 1 (instead of 8)

I agree let's set it to 1. In the end now you want to parallelise only if you really need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rust Rust extension related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants