Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend strategy #130

Closed
jonas-eschle opened this issue Sep 1, 2021 · 14 comments
Closed

Backend strategy #130

jonas-eschle opened this issue Sep 1, 2021 · 14 comments

Comments

@jonas-eschle
Copy link

Hi all, first of all, great work! Numerical integration methods in compiled, gradient supporting frameworks is really lacking.

My question is about backends. At least so far it seems that only a minor set of the torch methods are used. Since there are other libraries, notably JAX and TensorFlow, which both have a numpy-like API such as PyTorch, it seems nearly trivial to me to support this backends as well, at least for the status of the project now.

There are already a few dedicated projects in this niche, such as vegasflow and some onedim integrator.

I think the limitations of all frameworks are quite similar: they are great at vectors and bat at adaptive/sequential methods. I am also aware though that it can potentially make more sophisticated integration methods more difficult to implement, such as when using control flow.

Do you have any thoughts on this?

P.S: the motivation comes from a likelihood model fitting library that we use in High Energy Physics and which uses (currently) TensorFlow (and probably JAX in the future). Numerically integrate a function efficiently to make it a PDF is therefore key.

@gomezzz
Copy link
Collaborator

gomezzz commented Sep 1, 2021

Hi @jonas-eschle !

Thank you for your kind words, glad you like the project! :)

My question is about backends. At least so far it seems that only a minor set of the torch methods are used. Since there are other libraries, notably JAX and TensorFlow, which both have a numpy-like API such as PyTorch, it seems nearly trivial to me to support this backends as well, at least for the status of the project now.

True. This would definitely be possible.

I think the limitations of all frameworks are quite similar: they are great at vectors and bat at adaptive/sequential methods. I am also aware though that it can potentially make more sophisticated integration methods more difficult to implement, such as when using control flow.

Also true. Not sure about the performance vegasflow achieves but VEGAS in torchquad has proven tricky to optimize. Scaling can never compete with more naturally parallel methods such as a vanilla monte carlo or newton-cotes rules. Maybe, there are better Monte Carlo solvers in terms of achieving good scaling 🤔. Currently, there are no concrete plans on implementing other complex integrators, although it would definitely be interesting and worthwhile (see e.g. #127).

I'm not entirely sure if supporting TensorFlow is ideal as there are other modules for it already and having to keep up with several changing APIs does of course increase complexity? I am not aware of any implementations for JAX though, which may be interesting.

In general, I think it could be feasible to do something as in this project I recently came across:

This would also allow porting only some of the integrators to other frameworks (e.g. porting VEGAS maybe more complex and for TF not really worth it given that vegasflow exists?)

Is this somewhat what you had in mind? :)

@jonas-eschle
Copy link
Author

jonas-eschle commented Sep 1, 2021

Yes, exactly, this is about what I had in mind. And yes, the more complicated the method, the harder, or even impossible it gets to implement it (in a jittable and autograd way).

We also tried here actually to get some more dynamic methods in with mixed success: https://github.com/M-AlMinawi/Integration-of-single-variable-functions-using-TensorFlow

I think the backend class or similar should do the trick indeed!

For Vegas, at least the Vegas+ that vegasflow implements now, seems considerably better than plain MC, but I am not too familiar with the internals there

@gomezzz
Copy link
Collaborator

gomezzz commented Sep 1, 2021

Would it be something you would be interested in implementing? I think, ideally, one could approach this first for one of the Newton-Cotes (or plain MC) methods to see how much it breaks the codebase and how complex it would be for VEGAS+.

For Vegas, at least the Vegas+ that vegasflow implements now, seems considerably better than plain MC, but I am not too familiar with the internals there

In terms of convergence, I think, it really depends on the integrand. It needs to be sufficiently volatile to profit from the adaptiveness. In terms of runtime it can't scale as well as plain MC, I think.

I haven't had opportunity to compare to the vegasflow implementation in terms of speed, but at the moment the vegas+ in torchquad actually is faster on CPU. At least our implementation seems to not be parallel enough yet to profit from GPUs, I guess.

What kind of dimensionality do the problems you investigate have?

@jonas-eschle
Copy link
Author

I would surely be interested to help, e.g. also with the backend (choices), but I can't atm admit a lot of time to it.

In terms of convergence, I think, it really depends on the integrand. It needs to be sufficiently volatile to profit from the adaptiveness. In terms of runtime it can't scale as well as plain MC, I think.

Yes, we did some studies on this and it's sometimes useful and sometimes not, simply speaking

I haven't had opportunity to compare to the vegasflow implementation in terms of speed, but at the moment the vegas+ in torchquad actually is faster on CPU. At least our implementation seems to not be parallel enough yet to profit from GPUs, I guess.

I wonder as well, indeed, but we're currently investigating this anyway, I can let you know once we have some results on it.

The dimensionality goes from 1-2D problems usually up to 5-6 (so quite low dimensional still, but already tricky enough to integrate).

@gomezzz
Copy link
Collaborator

gomezzz commented Sep 2, 2021

I would surely be interested to help, e.g. also with the backend (choices), but I can't atm admit a lot of time to it.

Unfortunately, I am also rather busy at the moment. But we can start with a little requirements engineering to see if isn't fairly quick to do (see below).

I wonder as well, indeed, but we're currently investigating this anyway, I can let you know once we have some results on it.

Sure! I'm curious.

The dimensionality goes from 1-2D problems usually up to 5-6 (so quite low dimensional still, but already tricky enough to integrate).

But currently you are using vegas, as far as I understand? I'm wondering if the deterministic methods aren't still competitive here given the better scaling? But I think you tried with the thesis you linked before?

From your description I gather you would be most interested in starting with a TF backend?

Needed changes would be

  1. Implement a backend class similar to this in torchquad/utils/backend.py or torchquad/backend/ if it ends up being multiple files/classes. It has to provide access to all methods in TF and torch that are used in below selected integration method. Additionally, one ought to check that the API is the same (e.g. torch.transpose and np.transpose behave differently :S not sure about TF). This might warrant implementing a separate test for all functions in the backend to ensure consistency.
  2. Adapt one of the Newton-Cotes methods (e.g. trapezoid.py or simpson.py) to use the backend. This will likely also require integrating the backend into the integration_grid.py.
  3. Add a way for the user to choose the backend. As a start during the creation of the integrator (so constructor of above selected integration method) one can just pass an extra variable for it. This also makes it clear, which integrators support this. In the integrator, one can have something like
import tensorflow_backend as tf_backend

(...)

if selected_backend == "tensorflow":
     self.backend = tf_backend
(...)
  1. Update environment.yml to include framework (TF or such)
  2. Add some tests for it in the respective integrator test (e.g. `tests/trapezoid_test.py)

So it should not be too much work. Personally, I suspect the most annoying part is ensuring the backends match for all the functions.

@jonas-eschle
Copy link
Author

But currently you are using vegas, as far as I understand? I'm wondering if the deterministic methods aren't still competitive here given the better scaling? But I think you tried with the thesis you linked before?

Not yet, we're trying it out basically. And it seems to be quite better than QMC methods.

From your description I gather you would be most interested in starting with a TF backend?

Ish, TF has now a numpy like backend tensorflow.experimental.numpy that could be used (modulo control flow, that needs to be wrapped somehow). So I would suggest to use that.

Or maybe even better, something like autoray, which already wraps the low level numpy api of multiple libraries (not the gradients or control flow yet), but it could already help a lot.
Btw don't be fooled, they have a full numpy API, they just don't advertise it really.

Needed changes would be

  1. Implement a backend class similar to this in torchquad/utils/backend.py or torchquad/backend/ if it ends up being multiple files/classes. It has to provide access to all methods in TF and torch that are used in below selected integration method. Additionally, one ought to check that the API is the same (e.g. torch.transpose and np.transpose behave differently :S not sure about TF). This might warrant implementing a separate test for all functions in the backend to ensure consistency.

Yes, maybe already done with autoray or similar

  1. Adapt one of the Newton-Cotes methods (e.g. trapezoid.py or simpson.py) to use the backend. This will likely also require integrating the backend into the integration_grid.py.

yes, this needs to be done

  1. Add a way for the user to choose the backend. As a start during the creation of the integrator (so constructor of above selected integration method) one can just pass an extra variable for it. This also makes it clear, which integrators support this. In the integrator, one can have something like

Something like this, yes. But I would put that second order priority, I guess the main issue is to get it working actually.

  1. Update environment.yml to include framework (TF or such)
  2. Add some tests for it in the respective integrator test (e.g. `tests/trapezoid_test.py)

So it should not be too much work. Personally, I suspect the most annoying part is ensuring the backends match for all the functions.

Yes, I agree. And since this work can be taken by autoray, that may helps a lot

@gomezzz
Copy link
Collaborator

gomezzz commented Sep 3, 2021

Or maybe even better, something like autoray, which already wraps the low level numpy api of multiple libraries (not the gradients or control flow yet), but it could already help a lot.
Btw don't be fooled, they have a full numpy API, they just don't advertise it really.

Interesting! Thanks for pointing out autoray, that seems like a very exciting project that might help a lot on it. :)

I don't have time the next week but I think I will take a day to play around with this some time after. Or, if you want to try it out in torchquad, feel free to do so. For now, we can create a branch for this and start experimenting to see how complex it is in the end? I'll let you know when I find time. Feel free to mention if you get to it sooner than me!

But, overall autoray makes me fairly confident that this should be quite doable.

@jonas-eschle
Copy link
Author

Good, the we share the same view on this! I am also going to play around with it on zfit to get a feel for it, so yes, let's just start with an experimental branch and see how it goes

@gomezzz
Copy link
Collaborator

gomezzz commented Oct 13, 2021

@jonas-eschle There is now a master's student from TUM working on this as his thesis! He will be creating a branch in torchquad for this and we hope to have something running some time soon. Then, we can perform some performance and usability trials and see where it takes us!

How did it go with zfit?

@jonas-eschle
Copy link
Author

Hey, sorry I missed this post! We started as well, but had some other priorities in terms of design coming up to understand our general API better. But I think we will give this a try around January, I am also looking for a student currently to do that.

How is it going so far?

@gomezzz
Copy link
Collaborator

gomezzz commented Dec 1, 2021

No worries. :) Glad to hear you are still on it as well!

We are actually looking into a total conversion now as it really going quite well. Happening here https://github.com/FHof/torchquad/

It's already quite functional and we are now digging deeper into performance comparisons among the frameworks etc. to see which parts are bottlenecks on which frameworks, comparing things like using XLA, JIT etc. Progressing quite well!

@gomezzz
Copy link
Collaborator

gomezzz commented Apr 12, 2022

@FHof has fully integrated TF, numpy and JAX support in #137 :) Just merged it and will create a separate release for it soon.

@FHof Any overall thoughts on the autoray integration that one ought to keep in mind in your opinion when attempting to include autoray?

@FHof
Copy link
Collaborator

FHof commented Apr 13, 2022

Yes, when I rewrote the VEGAS code, it was helpful for me to separate the changes into multiple commits as follows. I first replaced torch functions with those from autoray; these changes were often simple but included many lines of code (23258a4). Then I changed code so that it works with both numpy and torch (c0416f0). After that, I added details such as support for dtypes.
This made it easier to find bugs: After the first step, VEGAS executed the same torch operations (just wrapped by autoray), whereas after the second step, it used different operations, for example reshape instead of unsqueeze. Therefore bugs introduced in the first step would be caused by a misuse of autoray (e.g. forgotten like arguments), whereas bugs in the second step would be caused by a misunderstanding of differences between backends (e.g. differences between backend-agnostic type conversion and torch's .int() and .long() tensor methods).

@gomezzz
Copy link
Collaborator

gomezzz commented Apr 14, 2022

Thanks!

@gomezzz gomezzz closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants