Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add cupy tensor space #1231

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open

ENH: add cupy tensor space #1231

wants to merge 36 commits into from

Conversation

kohr-h
Copy link
Member

@kohr-h kohr-h commented Nov 13, 2017

No description provided.

@adler-j
Copy link
Member

adler-j commented Nov 13, 2017

Looks like a trivial review, I'll try it tomorrow. Does this work with astra data containers?

@kohr-h
Copy link
Member Author

kohr-h commented Nov 13, 2017

Sure, go ahead. It's work in progress though.

@kohr-h
Copy link
Member Author

kohr-h commented Nov 13, 2017

I haven't tried ASTRA interoperability yet.

@adler-j adler-j mentioned this pull request Nov 14, 2017
20 tasks
Copy link
Member

@adler-j adler-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly only some minor stuff left, looks freaking magic to me!

else:
from pkg_resources import parse_version
if parse_version(cupy.__version__) < parse_version('2.0.0rc1'):
raise ImportError('cupy <2.0.0rc1 not supported')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a warning, we don't want to render ODL unusable due to a wrong version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that version 2.0.0rc1 adds support for complex arrays, I really don't want anything below. Actually we should go for 2.0.0, the latest one on PyPI. It wasn't out at the time of writing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, complex stuff is available in 2.0.0, I'll emit a warning for earlier versions.

# --- Space method implementations --- #


lico = cupy.ElementwiseKernel(in_params='T a, T x, T b, T y',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lazily evaluated? If not, we should be careful about the startup cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be, since T is a type template, so the only alternative would be to compile for all types, which I cannot imagine. The good thing is that the kernels are cached on disk, so after the first call things run way faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked. When I removed pkg_resources (bloody slow, we should avoid it whenever possible, at least during import odl), the import time of cupy_tensors.py is practically identical to the time of import cupy (~40 ms).

elif np.dtype(dtype) == 'float64':
prefix = 'd'
else:
raise ValueError('dtype {!r} not supported by cuBLAS'.format(dtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even complex values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuBLAS itself supports it, maybe the bindings are not implemented yet in cupy. I'll check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported in cupy 2.0.0


class CupyTensorSpace(TensorSpace):

"""Tensor space implemented with GPU arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention what backend is used.

self.__weighting = CupyTensorSpaceCustomNorm(norm)
elif inner is not None:
self.__weighting = CupyTensorSpaceCustomInner(inner)
else: # all None -> no weighing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to our no weighting vs const weighting debate? I.e. in several cases we expect stuff like this to be weighted:

space(weighting=n/10.0)  # "randomly" gives no weighting for n=10

I guess here it matters less than for the discretized spaces, but in them it really maters.

spc = odl.uniform_discr(0, 5, n)  # 'randomly' not weighted for n=5

this causes problems downstream for e.g. RayTransform which has special behaviour for non-weighted spaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still need a convincing argument why weighting with constant 1 would be fundamentally different from no weighting. I currently see it just as an optimization that scraps a multiplication with 1.

Copy link
Member

@adler-j adler-j Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not a very weird behaviour:

>>> spc = odl.uniform_discr([0, 0], [9.999, 9.999], [10, 10])
>>> rt = odl.tomo.RayTransform(spc, odl.tomo.parallel_beam_geometry(spc))
>>> rt.range
uniform_discr([  0.    , -14.1407], [  3.1416,  14.1407], (45, 31))

vs

>>> spc = odl.uniform_discr([0, 0], [10, 10], [10, 10])
>>> rt = odl.tomo.RayTransform(spc, odl.tomo.parallel_beam_geometry(spc))
>>> rt.range
uniform_discr([  0.    , -14.1421], [  3.1416,  14.1421], (45, 31), weighting=1.0)

I mean, wat? this has to be considered a rather severe bug, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems quite wrong.

But part of the problem seems to me that we need to jump through hoops currently to make the adjoint correct for weighted vs. unweighted spaces. Doing it via something like #1177 will solve the issue very elegantly by keeping all weighting stuff completely outside the operator definitions.
And when such a system is in place, no weighting and weighting with 1 will be exactly equivalent.

Copy link
Member

@adler-j adler-j Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this has nothing to do with the adjoint (in fact we don't even need the adjoint to have this problem), the problem here is that we want "the same weighting scheme" on the range as on the domain, but if we dont distinguish no weighting and 1.0 weight, we can't do that, because there is no concept of "the same weighting".

The solution in this case would be to fully remove the is_weighted flag and remove the "unweighted" functionality for the ray transform, but I'm not sure if I'm happy about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I was totally forgetting about the fact that we infer the range. Okay now I got it. Hm. We do have an optional range parameter for the ray transform, don't we? I tend to think that the behavior should be

  • as currently when we have a TensorSpace with cell_volume weighting,
  • no weighting in any other case.

To make it easier to change this, we could have a asweighted() method on TensorSpace where a new space of the same type but with new weighting can be created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's not quite obvious to me how we would figure out if we're using cell_volume weighting 1.0 or no weighting unless we expose this flag somehow. We previously did it by exposing thetype of the weighting, but I guess we could fix this by exposing the argument used to create the weighting (somehow), i.e. space.weighting_type returns a string 'const' or w/e. Does that sound reasonable?

I feel adding yet another as_... should hopefully not be needed right now, but I guess in the long run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been further addressed? Otherwise we need to solve it soon (perhaps not in this PR tho)

raise RuntimeError("no conversion for dtype {}"
"".format(arr.dtype))
else:
space = type(self.space)(arr.shape, dtype=self.dtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weighting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll port the current implementation in NumpyTensorSpace, the proper solution will come with #1238
Just in general: if you squash axes, there's no obvious way to propagate weightings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to settle what squashed axes mean. Imo it should set that dimension to "1" so to say, i.e. we keep space[0:1].weighting and space[0].weighting different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as of now, i.e., before #909 is done (part of #1238) we can't know how to handle weighting constants when removing axes. If all axes stay intact, we can keep the constant. If an axis is removed, we have to fall back to default. Not so great, but yeah.
For array weighting, i.e., the weights have the same shape as the space, we can just index them the same way, so that's a bit easier.

Numpy implementation is used which causes significant overhead
due to data copies between host and device.
"""
# TODO: Test with some native ufuncs, then remove this attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this todo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises a question: If we have a cupy space element and run some numpy code on it that requires a cast to numpy array, should we just go ahead and do it, at the cost of hidden performance loss, or should we raise?
I have a slight preference but want to make sure we're on the same page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well as always im in the "at least it works" boat. I guess it comes down to how feature complete cupy is, if it has almost everything it is fine, but if you stumble every thing you do, it's gonna be irritating to work with

Copy link
Member Author

@kohr-h kohr-h Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the same boat, so that's fine then. As far as I can see, they're pretty complete on the ufunc front, but methods like at and accumulate are largely not implemented. So we need some workaround to use the CUDA code for the obvious ones (sum, prod, cumsum, cumprod for add and multiply), but otherwise drop out to Numpy.

The cupy folks are a bit more worried about efficiency, probably since in neural nets, things are already buried deep down and hard to debug, so this kind of bottleneck danger would be pretty bad for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead with that then!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the manual fiddling.

newreal : `array-like` or scalar
The new real part for this tensor.
"""
self.real.data[:] = newreal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think self.data.real[:] would be slightly more efficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very possible.

real : `CupyTensor` view with real dtype
The real part of this tensor as an element of an `rn` space.
"""
# Only real dtypes currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite sure there were complex ones above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs update obviously.

if np.isscalar(weights):
weighting = CupyTensorSpaceConstWeighting(weights, exponent=exponent)
else:
# TODO: sequence of 1D array-likes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cite the issue number

# v. 2.0. If a copy of the MPL was not distributed with this file, You can
# obtain one at https://mozilla.org/MPL/2.0/.

"""Implementation of tensor spaces using ``pygpu``."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously the docs need update :-)

>>> same_space == space
True
"""
return (super().__eq__(other) and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update

@adler-j
Copy link
Member

adler-j commented Nov 15, 2017

Ok, so odlcuda is officially broken. I'll try to fix it, but getting this in would be a preferable solution. What is the ETA here, code looks quite good.

@kohr-h
Copy link
Member Author

kohr-h commented Nov 15, 2017

I'll give it a couple of days of focused work, but those days will be distributed a bit. I give it high prio at least.

@adler-j
Copy link
Member

adler-j commented Nov 15, 2017

I'm fixing odlcuda anyway

@kohr-h
Copy link
Member Author

kohr-h commented Nov 23, 2017

I'll look into this today. Also in view of #1246.

@kohr-h
Copy link
Member Author

kohr-h commented Nov 23, 2017

Getting there. I fixed the complex stuff, and all the doctests go green. Unit tests next.

Copy link
Member

@adler-j adler-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further comments to take into account. This is looking great!

signature_string, indent)

try:
import cupy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this import take?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40 ms, it's really fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! No worries then

return lico(a, x, 1, y, y)


def _flat_inc(arr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does inc mean here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the incx, incy stuff needed for the cuBLAS functions.



def _flat_inc(arr):
"""Compute the flat element stride for cuBLAS if possible, else raise."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Parameters, Returns, Raises here would be great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was working on that this morning. The implementation was also not correct :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid the cuBLAS handling, works as expected now. Also the lico fallback kernel takes arrays of different dtypes.

Numpy implementation is used which causes significant overhead
due to data copies between host and device.
"""
# TODO: Test with some native ufuncs, then remove this attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead with that then!

rn(3, impl='cupy', weighting=[1, 2, 3])
>>> space = odl.tensor_space((2, 3), impl='cupy', dtype=int)
>>> space
tensor_space((2, 3), 'int', impl='cupy')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't these doctests fail misserably for users without cupy, or do we exclude them somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least not when running the file. I'll have to check how to configure pytest to exclude them as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you explicitly run this file, you should get errors IMO, its harder with the global tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@property
def data_ptr(self):
"""A raw pointer to the data container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns
-------
data_ptr : int

to show the type of the returned value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to add Returns to @property docstrings since they do not "feel" like functions that return something. I'll add it somewhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything goes as long as the info is there!


Notes
-----
The element-by-element comparison is performed on the CPU,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? Why not write a ReductionKernel for this? Should be rather simple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Member

@adler-j adler-j Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to update the doc?

Edit: that was indeed the case. I fixed the doc

>>> x[::2]
rn(3, impl='cupy').element([ 1., 3., 5.])

The returned views are writable, so modificatons alter the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "view"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention that this is not always the case (i guess?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for index arrays and boolean arrays, right.

raise RuntimeError("no conversion for dtype {}"
"".format(arr.dtype))
else:
space = type(self.space)(arr.shape, dtype=self.dtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to settle what squashed axes mean. Imo it should set that dimension to "1" so to say, i.e. we keep space[0:1].weighting and space[0].weighting different.



if CUPY_AVAILABLE:
dotw = cupy.ReductionKernel(in_params='T x, T y, W w',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these compiled on the fly later? We don't want to create startup latency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I wrote somewhere else, the import time of this module is practically identical to the import time of cupy, around 40 ms. So these guys don't add overhead at creation time.

tmp = np.empty(out.shape, out.dtype, order=out.space.default_order)
with writable_array(out) as out_arr:
tmp = array_module(impl).empty(
out.shape, out.dtype, order=out.space.default_order)
Copy link
Member Author

@kohr-h kohr-h Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This of course assumes that impl has the same interface as numpy.

Perhaps it's better to implement wrappers for the most common functions, like asarray (already done), empty, and whatever else is needed. That would be more extensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be how e.g. Keras does it, but I say we leave this for now. Changing should be rather easy (just search replace basically).

Array into which the result should be written. Must be contiguous
and of the correct dtype.
impl : str, optional
Array backend for the output, used when ``out`` is not given.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably mention where the valid choices come from.

@@ -387,7 +387,7 @@ def __pow__(self, shape):
shape = tuple(shape)

pspace = self
for n in shape:
for n in reversed(shape):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an actual bug.

def asarray(self, out=None):
"""Extract the data of this array as a ``numpy.ndarray``.
def asarray(self, out=None, impl='numpy'):
"""Extract the data of this array as an ndarray.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make a glossary term ndarray that explains the different flavors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here

@@ -2266,7 +2288,7 @@ def norm(self, x):
if self.exponent == 2.0:
return float(np.sqrt(self.const) * _norm_default(x))
elif self.exponent == float('inf'):
return float(self.const * _pnorm_default(x, self.exponent))
return float(_pnorm_default(x, self.exponent))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this because there are two reasons for it:

  1. The mathematical one: ||x||_{p, w} --> ||x||_{inf, w} should hold for p --> inf (we knew that).

  2. Also structure-wise, all unweighted p-norms follow the pattern

    norm = post_map( reduce( map(x) ) )
    

    e.g.

    • 1 <= p < inf: map(x) = abs(x) ** p, reduce(x) = sum(x), post_map(x) = x ** (1/p)
    • p = inf: map(x) = abs(x), reduce(x) = max(x), post_map(x) = x

    Weighted norms have the structure

    norm = post_map( reduce( weight * map(x) ) )
    

    and here it follows naturally that the weights play no role for p = inf since they do not change the max. Why treat this case completely differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're right, lets do it like this for now.

@@ -500,7 +500,7 @@ def array(self):

def is_valid(self):
"""Return True if the array is a valid weight, i.e. positive."""
return np.all(np.greater(self.array, 0))
return (self.array > 0).all()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again assuming a certain API of the arrays. Perhaps better to have wrapper all, any and such. Dunno.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it like this until someone wants to add something that breaks it

for name, nin, nout, docstring in UFUNCS:
if nin == 2:
# Currently not supported
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a bunch of dysfunctional ufunc ops created here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

if isinstance(iter2, CupyTensor):
iter2 = iter2.asarray()
elif isinstance(iter2, cupy.ndarray):
iter2 = cupy.asnumpy(iter2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use utility.asarray here I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -181,6 +203,14 @@ def is_subdict(subdict, dictionary):
return all(item in dictionary.items() for item in subdict.items())


def xfail_if(condition, reason=''):
"""Return a ``pytest.xfail`` object if ``condition`` is ``True``."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain usage.

@kohr-h kohr-h mentioned this pull request Dec 1, 2017
21 tasks
Copy link
Member

@adler-j adler-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive review. Mostly looking good, just some minor stuff.

With that said, the increase in test run-time is worrying. I guess we can live with it for now, but we must diagnoize and improve the situation.

@@ -387,7 +387,7 @@ def __pow__(self, shape):
shape = tuple(shape)

pspace = self
for n in shape:
for n in reversed(shape):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? Wouldn't our users expect r2 ** (3, 4) == (r2 ** 3) ** 4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. When doing space[i], the index slices along the outermost power, i.e., for space ** (3, 4) you expect the valid indices i to be 0, 1, 2, no? It's the same logic as (R^m)^n == R^(n x m).
In your example, I would expect that r2 ** (3, 4) has shape (3, 4, 2).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with your statement. It certainly makes sense.

be applied, a ``ValueError`` is raised, triggering a fallback
implementation.

For **1 array**, the conditions to be fulfilled are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array 1?

return _cublas_scal(
x.data.device.cublas_handle, x.size, a, x.data.ptr, incx)

scal.__name__ = scal.__qualname__ = '_cublas_scal'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply name it _cublas_scal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this.


If possible, a cuBLAS implementation is returned, otherwise a fallback.

In general, cuBLAS requires single or double precision float or complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is overly long for these, simply reference the function above.

This implementation is a highly optimized, considering all special
cases of array alignment and special scalar values 0 and 1 separately.
"""
scal1 = _get_scal(x1.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should consider some type of cache for these? Or is that implemented downstream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache is an optimization that will have to come later. I'll focus on getting this in first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we have the same thing on the Numpy spaces, we can add caches for both in one go.

for name, nin, nout, docstring in UFUNCS:
if nin == 2:
# Currently not supported
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@@ -436,6 +447,7 @@ def ufunc_factory(domain=RealNumbers()):
globals()[name] = ufunc_factory
__all__ += (name,)

np.seterr(**npy_err_old)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should honestly add some "we're so sorry" line above this :D

if isinstance(iter2, CupyTensor):
iter2 = iter2.asarray()
elif isinstance(iter2, cupy.ndarray):
iter2 = cupy.asnumpy(iter2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

from odl.space.pspace import ProductSpace

if isinstance(space, ProductSpace) and not space.is_power_space:
raise ValueError('`space` cannot be a non-power product space')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Part of the reason we use this function is that it supports these things (as compared to e.g. space.element(np.random.rand))

# Workaround for `shape` not using the base space shape of a
# power space
# TODO: remove when fixed, see
# https://github.com/odlgroup/odl/pull/1152
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get this in ASAP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in, yay!

@adler-j
Copy link
Member

adler-j commented Feb 7, 2018

So, I'm officially announcing that I'll "take over" this PR and make a new PR for it.

@kohr-h
Copy link
Member Author

kohr-h commented Feb 7, 2018

So, I'm officially announcing that I'll "take over" this PR and make a new PR for it.

Go for it! Maybe close this one for difficulty to open the page?

ValueError

Slicing in the **fastest** axis is allowed as it results in a constant
stride in the flat memory. Slicing with stride in any other axis is
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, "is" shouldn't be there

@adler-j
Copy link
Member

adler-j commented Feb 7, 2018

I'll close this once I've fixed the outstanding issues

@mehrhardt
Copy link
Contributor

Any news on this front?

@adler-j adler-j mentioned this pull request Jun 28, 2018
@n1kt0
Copy link

n1kt0 commented May 16, 2019

How far is the integration process?

Thanks in advance and best regards,

Nikita

@odlgroup odlgroup deleted a comment from n1kt0 May 16, 2019
@odlgroup odlgroup deleted a comment from n1kt0 May 16, 2019
@adler-j
Copy link
Member

adler-j commented May 16, 2019

This has sadly stalled quite badly and I will not have time to focus on it in the coming months. If anyway wants to pick up I'd be more than happy.

@n1kt0
Copy link

n1kt0 commented May 16, 2019 via email

@adler-j
Copy link
Member

adler-j commented May 20, 2019

Sadly there is no summary but the main issue is to get the tests running. With that said this PR has become so old that it might be worth re-starting it with the newly updated spaces, perhaps @kohr-h knows how much has changes that touches upon this?

@kohr-h
Copy link
Member Author

kohr-h commented May 20, 2019

I'm hesitating to re-activate this one before we have settled on #1459 and #1458. Depending on the outcome of the latter in particular, getting in support for CuPy will be massively simpler (or not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants