[WIP] Allow pure numpy array (not dask array) as inputs #90

daxiongshu · 2020-10-29T06:11:03Z

Currently dask_glm.estimators only accepts dask.array as inputs due to the line below and other places where ._meta is accessed without checking the data type.

dask-glm/dask_glm/estimators.py

Line 67 in 7b2f85f

if is_dask_array_sparse(X):

dask-glm/dask_glm/utils.py

Lines 120 to 124 in 7b2f85f

    
           def is_dask_array_sparse(X): 
        
               """ 
        
               Check using _meta if a dask array contains sparse arrays 
        
               """ 
        
               return isinstance(X._meta, sparse.SparseArray)

Click to see the example code and error

Code:

from dask_glm.estimators import LogisticRegression
import numpy
x = numpy.random.rand(10,4)
y = numpy.random.rand(10)

lr = LogisticRegression()
lr.fit(x,y)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-e644bf405118> in <module>
----> 1 lr.fit(x,y)

~/rapids/daskml_cupy/dask-glm/dask_glm/estimators.py in fit(self, X, y)
     65         X_ = self._maybe_add_intercept(X)
     66         fit_kwargs = dict(self._fit_kwargs)
---> 67         if is_dask_array_sparse(X):
     68             fit_kwargs['normalize'] = False
     69 

~/rapids/daskml_cupy/dask-glm/dask_glm/utils.py in is_dask_array_sparse(X)
    122     Check using _meta if a dask array contains sparse arrays
    123     """
--> 124     return isinstance(X._meta, sparse.SparseArray)
    125 
    126 

AttributeError: 'numpy.ndarray' object has no attribute '_meta'

This PR allows numpy arrays (not dask numpy array) as input directly.

daxiongshu · 2020-10-29T12:58:40Z

@mrocklin @pentschev I just added one test for now. If it is ok, could you please suggest which other tests I should add numpy input? Thank you!

daxiongshu · 2020-10-29T12:59:49Z

~~I think I'm going to finish this first and then move on to #89~~
Not really. I'll move on to #89

pentschev

@daxiongshu I added a few requests to make the code easier and more Dask-like, also a few questions on things that aren't clear to me. Please take a look when you have a moment.

pentschev · 2020-11-02T21:30:12Z

dask_glm/algorithms.py

@@ -11,7 +11,7 @@
 from scipy.optimize import fmin_l_bfgs_b


-from dask_glm.utils import dot, normalize, scatter_array, get_distributed_client
+from dask_glm.utils import dot, normalize, scatter_array, get_distributed_client, safe_zeros_like


Where is safe_zeros_like coming from? I suppose you wanted to from dask.array.utils import zeros_like_safe instead, from https://github.com/dask/dask/blob/48a4d4a5c5769f6b78881adeb1b3973a950e5f43/dask/array/utils.py#L350

pentschev · 2020-11-02T21:31:54Z

dask_glm/utils.py

+    if isinstance(X, da.Array):
+        return np.zeros_like(X._meta, shape=shape)
+    return np.zeros_like(X, shape=shape)


Suggested change

if isinstance(X, da.Array):

return np.zeros_like(X._meta, shape=shape)

return np.zeros_like(X, shape=shape)

return zeros_like_safe(meta_from_array(X))

You'll also need to from dask.array.utils import meta_from_array at the top.

Sorry for the late reply, I think I might misunderstand our other conversion. #89 (comment)

This PR intends to enable dask-glm to deal with pure numpy arrays. Please let me know if not so and dask-glm should only accept dask arrays.

dask-glm/dask_glm/algorithms.py

Lines 100 to 101 in 7b2f85f

beta = np.zeros_like(X._meta, shape=p)

Let's say the input X is a pure numpy or cupy array, not a dask array. beta = np.zeros_like(X._meta) will be an error. The safe_zeros_like (bad naming) I implemented will check if X is a pure numpy/cupy array or a dask array and return a pure numpy/cupy array. In contrast, da.utils.zeros_like_safe returns a dask array. In this case the beta should be a pure numpy/cupy array.

Let me know if this clears things up. Thank you!

The safe_zeros_like (bad naming) I implemented will check if X is a pure numpy/cupy array or a dask array and return a pure numpy/cupy array.

That's exactly what meta_from_array does. It will return an array of the type _meta has (i.e., chunk type), so if the input is a NumPy array or a Dask array backed by NumPy, the result is an empty numpy.ndarray, and if the input is a CuPy array or a Dask array backed by CuPy, the result is an empty cupy.ndarray.

In contrast, da.utils.zeros_like_safe returns a dask array.

That isn't necessarily true, it will only return a Dask array if the reference array is a Dask array. Because we're getting the underlying chunk type with meta_from_array, the resulting array will be either a NumPy or CuPy array.

Aha, that works! I will make the changes.

pentschev · 2020-11-02T21:34:03Z

dask_glm/utils.py

@@ -149,6 +149,11 @@ def add_intercept(X):
    return X_i


+@dispatch(object)
+def add_intercept(X):
+    return np.concatenate([X, np.ones_like(X, shape=(X.shape[0], 1))], axis=1)


Suggested change

return np.concatenate([X, np.ones_like(X, shape=(X.shape[0], 1))], axis=1)

return np.concatenate([X, ones_like_safe(X, shape=(X.shape[0], 1))], axis=1)

Also needs from dask.array.utils import ones_like_safe.

pentschev · 2020-11-02T21:47:35Z

dask_glm/tests/test_estimators.py

    X, y = make_classification(n_samples=100, n_features=5, chunksize=10, is_sparse=is_sparse)
+    if is_numpy:
+        X, y = dask.compute(X, y)
    lr = LogisticRegression(fit_intercept=fit_intercept)
    lr.fit(X, y)
    lr.predict(X)


I don't think I understand this test. When is is_numpy the case in a real-world example, IOW, will you ever have X and y be pure NumPy arrays that's worth testing with LogisticRegression? I assumed you'd only have Dask arrays (backed by Sparse or not).

That's exactly what I tried to do, where both X and y are pure numpy/cupy arrays. Is that a feature we want? The current dask-glm only accepts dask arrays.

I don't think that's a feature we need to support explicitly, I believe anybody using dask-glm would want to use Dask arrays rather than pure NumPy/CuPy ones.

Thank you! I'll prioritize #89 then.

daxiongshu added 4 commits October 28, 2020 20:23

fix is_dask_array_sparse

4bfdc01

numpy works. cupy works except admm & lbfgs

2ad455f

add one test for numpy input

9a8170c

fix test_fit

1af0b03

pentschev suggested changes Nov 2, 2020

View reviewed changes

daxiongshu changed the title ~~[WIP] Allow numpy array (not dask array) as inputs~~ [WIP] Allow pure numpy array (not dask array) as inputs Nov 11, 2020

daxiongshu mentioned this pull request Nov 11, 2020

[Review] Allow lbfgs and admm with dask cupy inputs #89

Merged

Base automatically changed from master to main February 10, 2021 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Allow pure numpy array (not dask array) as inputs #90

[WIP] Allow pure numpy array (not dask array) as inputs #90

daxiongshu commented Oct 29, 2020 •

edited

Loading

daxiongshu commented Oct 29, 2020

daxiongshu commented Oct 29, 2020 •

edited

Loading

pentschev left a comment

pentschev Nov 2, 2020

pentschev Nov 2, 2020

pentschev Nov 2, 2020

daxiongshu Nov 11, 2020 •

edited

Loading

pentschev Nov 11, 2020

daxiongshu Nov 11, 2020

pentschev Nov 2, 2020

pentschev Nov 2, 2020

pentschev Nov 2, 2020

daxiongshu Nov 11, 2020

pentschev Nov 11, 2020

daxiongshu Nov 11, 2020

	def is_dask_array_sparse(X):
	"""
	Check using _meta if a dask array contains sparse arrays
	"""
	return isinstance(X._meta, sparse.SparseArray)

	return np.concatenate([X, np.ones_like(X, shape=(X.shape[0], 1))], axis=1)
	return np.concatenate([X, ones_like_safe(X, shape=(X.shape[0], 1))], axis=1)

[WIP] Allow pure numpy array (not dask array) as inputs #90

Are you sure you want to change the base?

[WIP] Allow pure numpy array (not dask array) as inputs #90

Conversation

daxiongshu commented Oct 29, 2020 • edited Loading

daxiongshu commented Oct 29, 2020

daxiongshu commented Oct 29, 2020 • edited Loading

pentschev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daxiongshu Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daxiongshu commented Oct 29, 2020 •

edited

Loading

daxiongshu commented Oct 29, 2020 •

edited

Loading

daxiongshu Nov 11, 2020 •

edited

Loading