-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tch-based Python module? #174
Comments
I am intrigued. Why using a python wrapper for tch and not pytorch directly? |
An RNN cell is going to involve a for loop. When you implement it in Python, you end up with a bit too much work being done in Python, and inference speed for longer sequences on cpu can benefit quite a bit from moving the cell implementation to, say, C++. I'm thinking a similar speedup could be achieved by implementing the cell in Rust. Edit: I think a good starting place might be the examples here: https://github.com/PyO3/pyo3#examples. |
I understand. The ideal setup would be to train the model in a high perf language like Rust, and export it for usage to a scripting language like Python. Unfortunately, the tch library did not have a model export functionality last time I checked. A python wrapper idea could be a good compromise if it works. I will give it a try and share if I get somewhere. Thanks |
I don't know of any such example, and that indeed seems like a good use case. The C++ api has a tutorial about this [Custom C++ and CUDA Extensions], it would be very nice to write a Rust version of this. As you noted PyO3 is likely to be a good point to start (tch-rs already uses the cpython crate for interfacing with a Python runtime for reinforcement learning examples).
Funnily I'm actually using the opposite setup: I experiment with models and train them in python as it's more flexible to play with and the heavy lifting takes place on the gpu anyway. When it comes to productionizing/deploying models, I use rust as I find it much better to build large robust systems. |
I think we have had this debate when I was enquiring on how to export a JIT model with tch =;] Using a tch model with cpython seems to work. Here is an example of exported function: fn tch_train(_py: Python, xs: Vec<Vec<f64>>, ys: Vec<f64>) -> PyResult<f64> {
let mut loss = 1f64;
let vs = nn::VarStore::new( tch::Device::Cpu );
let model = nn::seq().add( nn::linear(&vs.root(), 5, 1, Default::default()) );
let mut optim = RmsProp::default().build(&vs, 0.001).unwrap();
while loss > 1e-4 {
for (x,y) in xs.iter().zip( ys.iter() ) {
let t_x = Tensor::of_slice( &x.clone().as_slice() ).to_kind( Kind::Float ).unsqueeze(0);
let t_y = Tensor::of_slice( &[y.clone()]).to_kind( Kind::Float ).unsqueeze(0);
let t_out = model.forward( &t_x ).squeeze();
let t_loss = (t_y - t_out).pow(2f64).sum( Kind::Float );
optim.backward_step( &t_loss );
loss = f64::from( t_loss );
}
}
Ok(loss)
} and the Python code I ran for testing: import mymodule
import numpy as np
def my_func(x):
return np.sum( x * np.array([5.0,-4.0,3.0,-2.0,1.0]))
xs = np.random.random(100).reshape((20,5)).tolist()
ys = np.apply_along_axis(my_func,1,xs).tolist()
print( mymodule.tch_train(xs,ys) ) The easy way to proceed is to have one wrapped function that trains the tch model and saves it to disk, and another that loads the model from disk and runs the estimation. It is obviously not ideal if estimations are requested at high frequency. Creating a PyClass that encapsulates the model could be the more optimal solution but I would not bet it would work. |
That's a pretty cool example. |
In this example, I have only used Numpy arrays for brevity of code. In a couple of lines, I could generate random numbers, arrange them in the right shape and calculate the output vector. Ultimately, what is passed to the wrapper function are Python Lists as they are seamlessly converted to Rust Vecs by cypthon. I do not quite understand the dynamics in the GPU setting as I have never really used CUDA. |
Just to clarify on my use case, my first goal would be to get a speedup via Rust for inference on cpu. A speedup on training would be a bonus. Sounds like there is a pathway for passing a numpy in python into a function written in Rust, where it's available as a PyArrayDyn, for example: |
Right, so in summary, the aim of this exercise is not only to access Tch models from Python, but it is also about limiting type casting during the data transfer. In that case, Pytorch Tensor from/to Tch Tensor conversions would indeed be the most appropriate. Intuitively if they are similar at memory level, there should be an "unsafe" way of getting it done. |
I understand the motivation of speeding up the loop using Rust rather than a pure Python implementation, but I am unsure this is the most effective way to achieve greater speed. This discussion indicates that a pure translation from Python to a high performance language would result in speed gains of about ~10%. It points to a useful article that is most likely to offer more significant benefits, especially using fusing. The cost of a loop over a sequence of say, 100 elements, is relatively low compared to the RNN operations within each iteration (especially for complex LSTM or GRU-like units). |
I actually have another use case here where I want to efficiently pass PyTorch tensor's between Python and Rust. With AllenNLP, one of our main performance drags is data loading, especially when the dataset is too big to fit in memory so that you have to lazily load it on the fly. If you can't load data fast enough, you can't keep the GPUs occupied. So we've been playing around with the idea of writing data loaders in Rust that would pass off tensors to Python. I would love to hear if anyone's gotten any further with this. |
Supporting writing PyTorch extension in Rust will be extremely useful, especially if you want to do more complicated operations (just like the motivation of PyTorch official Cpp/CUDA extension tutorial). However currently pyo3 does not recognize tch tensor type, maybe we can start from adding tensor support for pyo3? |
@LaurentMazare Any thoughts on this? I'd like to help if needed. |
Sorry for being slow to come back, a decent first step would indeed be to be able to get from the python tensor object the underlying pointer, pass it through pyo3 so that we could build the same tensor on the rust side. I guess most of the work would be in understanding how the python api wrap things, I may look at it when I find some time but it's unlikely to happen in the next couple weeks or so. |
I've started trying to do this for my own project and ran into an exit code 139. I used this code to create a function in PyO3:
I added a from_ptr function to the Tensor class in the tensor.rs file of tch-rs:
But I got this error:
got past ct |
I haven't investigated this very deeply so I may be way off but I think you're passing a pointer to the python tensor object and try to interpret it as a C tensor pointer which results in a segfault. |
Wouldn't that copy the array but not include the gradient information and loose track computation graph? I would like this to work with autograd but that might be difficult if we can't directly wrap the same C++ object being used by python. |
I'm not sure about the gradient as I haven't thought about this deeply but I feel that it would be a first proof of concept. If this ends up not segfaulting, it will be possible to build up from there. |
HI, @jogardi @l1an0 . https://github.com/SunDoge/tch-to-pytorch-poc/blob/master/src/lib.rs |
@LaurentMazare Maybe we can create a user guide on how to create Python module with |
I'd also be very excited to see this happen. @SunDoge do gradients associated with the tensors pass between Python and Rust in your prototype? |
@Ejhfast nope, but it's possible. |
Hi, I had the same need for python - tch interface. It requires linking pytorch_python library as the functions are located in python_variable.h. As I understood THPVariable_Wrap is used to wrap the tensor and transfer ownership to python. While THPVariable_Unpack gives you the pointer to the tensor object itself. The proof of concept is a bit of hack right now: I would like to build a more polished version and open a pr if there is interest in such a thing. |
We've added a new |
Do you know of any examples of Python modules written in Rust using tch? I'm interested in implementing a custom RNN cell in Rust using tch and exposing it to be used in a PyTorch program.
The text was updated successfully, but these errors were encountered: