Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using PyMC3 on the GPU #1246

Closed
twiecki opened this issue Jul 18, 2016 · 15 comments
Closed

Using PyMC3 on the GPU #1246

twiecki opened this issue Jul 18, 2016 · 15 comments

Comments

@twiecki
Copy link
Member

twiecki commented Jul 18, 2016

Most input provided by @fhuszar.

It seems there are at least two blockers of using pymc3 on the gpu. The first one is incompatilibty with float32 dtype. Here is an example model:

from pymc3 import Model, NUTS, sample
from pymc3.distributions import DensityDist
import pymc3 as pm
import theano
import numpy as np

theano.config.floatX = 'float32'
theano.config.compute_test_value = 'raise'
theano.config.exception_verbosity= 'high'
with Model() as denoising_model:
#     theano.config.compute_test_value = 'off'

    x = DensityDist('x',
            logp= lambda value: -(value**2).sum(),
            shape=(1, 1, 10, 10),
            testval=np.random.randn(1,1,10,10).astype('float32'),
            dtype='float32',
        )

    sampler = pm.Metropolis()
    trace = sample(10, sampler)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    898             outputs =\
--> 899                 self.fn() if output_subset is None else\
    900                 self.fn(output_subset=output_subset)

TypeError: expected type_num 11 (NPY_FLOAT32) got 12

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9a14b0f25285> in <module>()
     19 
     20     sampler = pm.Metropolis()
---> 21     trace = sample(10, sampler)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    148         sample_func = _sample
    149 
--> 150     return sample_func(**sample_args)
    151 
    152 

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
    157     progress = progress_bar(draws)
    158     try:
--> 159         for i, strace in enumerate(sampling):
    160             if progressbar:
    161                 progress.update(i)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    239         if i == tune:
    240             step = stop_tuning(step)
--> 241         point = step.step(point)
    242         strace.record(point)
    243         yield strace

/home/wiecki/working/projects/pymc/pymc3/step_methods/arraystep.py in step(self, point)
    125         bij = DictToArrayBijection(self.ordering, point)
    126 
--> 127         apoint = self.astep(bij.map(point))
    128         return bij.rmap(apoint)
    129 

/home/wiecki/working/projects/pymc/pymc3/step_methods/metropolis.py in astep(self, q0)
    125             q = q0 + delta
    126 
--> 127         q_new = metrop_select(self.delta_logp(q, q0), q, q0)
    128 
    129         if q_new is q:

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    910                     node=self.fn.nodes[self.fn.position_of_error],
    911                     thunk=thunk,
--> 912                     storage_map=getattr(self.fn, 'storage_map', None))
    913             else:
    914                 # old-style linkers raise their own exceptions

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    312         # extra long error message in that case.
    313         pass
--> 314     reraise(exc_type, exc_value, exc_trace)
    315 
    316 

/home/wiecki/miniconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    897         try:
    898             outputs =\
--> 899                 self.fn() if output_subset is None else\
    900                 self.fn(output_subset=output_subset)
    901         except Exception:

TypeError: expected type_num 11 (NPY_FLOAT32) got 12
Apply node that caused the error: Elemwise{sqr,no_inplace}(Reshape{4}.0)
Toposort index: 5
Inputs types: [TensorType(float32, (True, True, False, False))]
Inputs shapes: [(1, 1, 10, 10)]
Inputs strides: [(800, 800, 80, 8)]
Inputs values: ['not shown']
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{sqr,no_inplace}.0)]]

Debugprint of the apply node: 
Elemwise{sqr,no_inplace} [id A] <TensorType(float32, (True, True, False, False))> ''   
 |Reshape{4} [id B] <TensorType(float32, (True, True, False, False))> ''   
   |Subtensor{int64:int64:} [id C] <TensorType(float32, vector)> ''   
   | |inarray1 [id D] <TensorType(float32, vector)>
   | |Constant{0} [id E] <int64>
   | |Constant{100} [id F] <int64>
   |TensorConstant{[ 1  1 10 10]} [id G] <TensorType(int64, vector)>

Storage map footprint:
 - inarray, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - Reshape{4}.0, Shape: (1, 1, 10, 10), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - inarray1, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - TensorConstant{[ 1  1 10 10]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{100}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 TotalSize: 2448.0 Byte(s) 0.000 GB
 TotalSize inputs: 1648.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Not sure where the float64 dtype comes in. @nouiz any ideas?

The second problem is related to #566 and should have a simple solution of only setting the test value behavior in the model context.

@nouiz
Copy link
Contributor

nouiz commented Jul 18, 2016

This error is strange. Can you update Theano to the dev version?

It look like the reshape is upcasting the float32 to float64 when it
shouldn't. This is just calling numpy reshape. This is all on CPU.

On Mon, Jul 18, 2016 at 9:29 AM, Thomas Wiecki notifications@github.com
wrote:

Most input provided by @fhuszar https://github.com/fhuszar.

It seems there are at least two blockers of using pymc3 on the gpu. The
first one is incompatilibty with float32 dtype. Here is an example model:

from pymc3 import Model, NUTS, samplefrom pymc3.distributions import DensityDistimport pymc3 as pmimport theanoimport numpy as np

theano.config.floatX = 'float32'
theano.config.compute_test_value = 'raise'
theano.config.exception_verbosity= 'high'with Model() as denoising_model:# theano.config.compute_test_value = 'off'

x = DensityDist('x',
        logp= lambda value: -(value**2).sum(),
        shape=(1, 1, 10, 10),
        testval=np.random.randn(1,1,10,10).astype('float32'),
        dtype='float32',
    )

sampler = pm.Metropolis()
trace = sample(10, sampler)

Output:


TypeError Traceback (most recent call last)
/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
898 outputs =
--> 899 self.fn() if output_subset is None else
900 self.fn(output_subset=output_subset)

TypeError: expected type_num 11 (NPY_FLOAT32) got 12

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in ()
19
20 sampler = pm.Metropolis()
---> 21 trace = sample(10, sampler)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
148 sample_func = _sample
149
--> 150 return sample_func(**sample_args)
151
152

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
157 progress = progress_bar(draws)
158 try:
--> 159 for i, strace in enumerate(sampling):
160 if progressbar:
161 progress.update(i)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
239 if i == tune:
240 step = stop_tuning(step)
--> 241 point = step.step(point)
242 strace.record(point)
243 yield strace

/home/wiecki/working/projects/pymc/pymc3/step_methods/arraystep.py in step(self, point)
125 bij = DictToArrayBijection(self.ordering, point)
126
--> 127 apoint = self.astep(bij.map(point))
128 return bij.rmap(apoint)
129

/home/wiecki/working/projects/pymc/pymc3/step_methods/metropolis.py in astep(self, q0)
125 q = q0 + delta
126
--> 127 q_new = metrop_select(self.delta_logp(q, q0), q, q0)
128
129 if q_new is q:

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
910 node=self.fn.nodes[self.fn.position_of_error],
911 thunk=thunk,
--> 912 storage_map=getattr(self.fn, 'storage_map', None))
913 else:
914 # old-style linkers raise their own exceptions

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
312 # extra long error message in that case.
313 pass
--> 314 reraise(exc_type, exc_value, exc_trace)
315
316

/home/wiecki/miniconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
683 value = tp()
684 if value.traceback is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value
687

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
897 try:
898 outputs =
--> 899 self.fn() if output_subset is None else
900 self.fn(output_subset=output_subset)
901 except Exception:

TypeError: expected type_num 11 (NPY_FLOAT32) got 12
Apply node that caused the error: Elemwise{sqr,no_inplace}(Reshape{4}.0)
Toposort index: 5
Inputs types: [TensorType(float32, (True, True, False, False))]
Inputs shapes: [(1, 1, 10, 10)]
Inputs strides: [(800, 800, 80, 8)]
Inputs values: ['not shown']
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{sqr,no_inplace}.0)]]

Debugprint of the apply node:
Elemwise{sqr,no_inplace} [id A] <TensorType(float32, (True, True, False, False))> ''
|Reshape{4} [id B] <TensorType(float32, (True, True, False, False))> ''
|Subtensor{int64:int64:} [id C] <TensorType(float32, vector)> ''
| |inarray1 [id D] <TensorType(float32, vector)>
| |Constant{0} [id E]
| |Constant{100} [id F]
|TensorConstant{[ 1 1 10 10]} [id G] <TensorType(int64, vector)>

Storage map footprint:

  • inarray, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
  • Reshape{4}.0, Shape: (1, 1, 10, 10), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
  • inarray1, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
  • TensorConstant{[ 1 1 10 10]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
  • Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
  • Constant{100}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
    TotalSize: 2448.0 Byte(s) 0.000 GB
    TotalSize inputs: 1648.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Not sure where the float64 dtype comes in. @nouiz
https://github.com/nouiz any ideas?

The second problem is related to #566
#566 and should have a simple
solution of only setting the test value behavior in the model context.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246, or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-255NsJ-zS24fQW-oeBYq6VhT3k7ks5qW3-jgaJpZM4JOtlO
.

@twiecki
Copy link
Member Author

twiecki commented Jul 18, 2016

Just updated to '0.7.0.dev-7ba9c05257347024ea90eed2f464f26cb4242b93' and the error is the same.

@nouiz
Copy link
Contributor

nouiz commented Jul 18, 2016

0.7.0.dev... is very old. Can you update to 0.9.0.dev2?

On Mon, Jul 18, 2016 at 3:38 PM, Thomas Wiecki notifications@github.com
wrote:

Just updated to '0.7.0.dev-7ba9c05257347024ea90eed2f464f26cb4242b93' and
the error is the same.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-z43-8P4Mc51THprMJTM8WpZxDFLks5qW9YhgaJpZM4JOtlO
.

@twiecki
Copy link
Member Author

twiecki commented Jul 18, 2016

Identical error on '0.9.0dev2.dev-d5944c965c453558ef834b439b671e0c01530b3c'

@twiecki
Copy link
Member Author

twiecki commented Jul 21, 2016

@nouiz Any idea on what might be causing the error on most recent theano?

@fhuszar
Copy link

fhuszar commented Jul 21, 2016

Isn't the problem that the Metropolis sampler updates some of the variables
in astep, and that this update happens outside theano so the float64s
come in? Somewhere perhaps a numpy array that is float is being added or
multiplied, for example, what's he type of self.scaling?

A quick fix might just be adding allow_input_downcast=True every time you
call theano.function For example in nuts: f = theano.function([q, p, e, q0, p0], [q1, p1, dE], profile=profile)

On Thu, Jul 21, 2016 at 3:27 PM, Thomas Wiecki notifications@github.com
wrote:

@nouiz https://github.com/nouiz Any idea on what might be causing the
error on most recent theano?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQrvnYG-xsueEkvzJd8oSV04eS-PyF1ks5qX4HUgaJpZM4JOtlO
.

@twiecki
Copy link
Member Author

twiecki commented Jul 21, 2016

@fhuszar Interesting, I'll look into that. I suppose proposals should take either the type of the RV, or the type or floatX.

@fhuszar
Copy link

fhuszar commented Jul 21, 2016

Yes, but I think if the theano function has allow_input_downcast it should automatically figure that out. - I think

@nouiz
Copy link
Contributor

nouiz commented Jul 21, 2016

allow_input_downcast to theano.function() only work on the input to the
function, not to the inputs of each node. We can't easily lift that
restriction and I think it is a bad idea to do so.

On Thu, Jul 21, 2016 at 10:58 AM, Ferenc Huszar notifications@github.com
wrote:

Yes, but I think if the theano function has allow_input_downcast it should
automatically figure that out. - I think


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-9QfJkoO9I7pNLhuBBCdKsqtptDRks5qX4kSgaJpZM4JOtlO
.

@fhuszar
Copy link

fhuszar commented Jul 21, 2016

If you use GPU in theanorc then all nodes in the company graph will be
consistently floatX type. I think it's only the input variable whose type
will be the type of the bumpy array unless allow_input_downcast is set. I
agree that it would be ideal if the float64s were manually downcast but
that may be hard to do.

Also, ideally the random number generation, leapfrog step, metropolis
updates, accept/reject would also happen in theano (perhaps best
implemented via the updates parameter of theano.function as it is done for
example in lasagne.updates) so there would be no back and forth data
transfer between CPU and GPU. If you want to sample images or large
parameter tensors for deep learning, significant time will be lost
transferring data between CPU and GPU memory.

On Thursday, 21 July 2016, Frédéric Bastien notifications@github.com
wrote:

allow_input_downcast to theano.function() only work on the input to the
function, not to the inputs of each node. We can't easily lift that
restriction and I think it is a bad idea to do so.

On Thu, Jul 21, 2016 at 10:58 AM, Ferenc Huszar <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Yes, but I think if the theano function has allow_input_downcast it
should
automatically figure that out. - I think


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AALC-9QfJkoO9I7pNLhuBBCdKsqtptDRks5qX4kSgaJpZM4JOtlO

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQrvhnPFMZJz0bqPCGYLGhY74--9oRgks5qX7Z-gaJpZM4JOtlO
.

@twiecki
Copy link
Member Author

twiecki commented Jul 22, 2016

@fhuszar You are certainly correct that that's the issue. I did some manual down-casting in Metropolis and NUTS and the above example works. Here is a branch that makes python use the correct dtypes: #1253, can you have a look?

The leapfrog steps and metropolis updates are already happening in theano I believe (but could be wrong). The random number generation however is not, that would probably be a better fix. Could you be a bit more specific what would need to change?

@magnushiie
Copy link

magnushiie commented Dec 31, 2016

Another issue preventing running on the GPU is the _psi C function in distributions/special.py, which is missing the __device__ annotation. Theano has a similar _psi function with DEVICE macro:

// For GPU support
            #ifdef __CUDACC__
            #define DEVICE __device__
            #else
            #define DEVICE
            #endif

            #ifndef _PSIFUNCDEFINED
            #define _PSIFUNCDEFINED
            DEVICE double _psi(double x){

Perhaps PyMC3 should use Theano's Psi instead?

@twiecki
Copy link
Member Author

twiecki commented Dec 31, 2016

@magnushiie Thanks for the pointer, can you point to where in the theano code that is? Should definitely use that.

@twiecki
Copy link
Member Author

twiecki commented Dec 31, 2016

@twiecki
Copy link
Member Author

twiecki commented Sep 16, 2021

Supported by JAX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants