-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX trigger indptr/indices copy when data are copied in astype #18192
Conversation
In terms of non-regression tests, the minimal example in the issue #8678 could be used. However, I assume that it would be best to test several formats of sparse matrices. Unfortunately, I am a bit lost with the test suite. I think that it should go in |
For testing, it should be sufficient to add another method next to the existing def test_astype_immutable(self):
D = array([[2.0 + 3j, 0, 0],
[0, 4.0 + 5j, 0],
[0, 0, 0]])
S = self.spmatrix(D)
if hasattr(S, 'data'):
S.data.flags.writeable = False
for x in supported_dtypes:
D_casted = D.astype(x)
S_casted = S.astype(x)
assert_equal(S_casted.dtype, D_casted.dtype) I wonder if the root cause issue is actually in the implementation of |
It was what you intended originally in #8679. Some of the failures in the CI seem to be that the If we make changes in |
The current error is raised when a LIL matrix call Since the signature use |
[0, 0, 0]]) | ||
S = self.spmatrix(D) | ||
if hasattr(S, 'data'): | ||
S.data.flags.writeable = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only define the data
to be read-only. However, from the original issue, we could also have read-only intptr
and indices
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch allows the full test suite to pass for me locally on this branch, though it doesn't win any prizes for elegance. Just providing in case it helps get this unstuck.
diff --git a/scipy/sparse/_csparsetools.pyx.in b/scipy/sparse/_csparsetools.pyx.in
index 5c47852c0..e5972eac0 100644
--- a/scipy/sparse/_csparsetools.pyx.in
+++ b/scipy/sparse/_csparsetools.pyx.in
@@ -176,7 +176,12 @@ def _lil_get_lengths_{{NAME}}(object[:] input,
{{define_dispatch_map('_LIL_GET_LENGTHS_DISPATCH', '_lil_get_lengths', IDX_TYPES)}}
-def lil_flatten_to_array(object[:] input,
+ctypedef fused obj_fused:
+ object
+ double
+
+
+def lil_flatten_to_array(const obj_fused[:] input,
cnp.ndarray output):
return _LIL_FLATTEN_TO_ARRAY_DISPATCH[output.dtype](input, output)
@@ -311,7 +316,7 @@ def _lil_fancy_set_{{PYIDX}}_{{PYVALUE}}(cnp.npy_intp M, cnp.npy_intp N,
def lil_get_row_ranges(cnp.npy_intp M, cnp.npy_intp N,
- object[:] rows, object[:] datas,
+ const obj_fused[:] rows, const obj_fused[:] datas,
object[:] new_rows, object[:] new_datas,
object irows,
cnp.npy_intp j_start,
Thanks @tylerjereddy, I was indeed locked with trying to use the |
Let's see if CJ can stomach my Cython hacks :) I'm actually not super familiar with |
@@ -176,7 +176,11 @@ def _lil_get_lengths_{{NAME}}(object[:] input, | |||
|
|||
{{define_dispatch_map('_LIL_GET_LENGTHS_DISPATCH', '_lil_get_lengths', IDX_TYPES)}} | |||
|
|||
def lil_flatten_to_array(object[:] input, | |||
ctypedef fused obj_fused: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate, but based on cython/cython#2485 it seems like the best workaround for now.
Let's add a comment mentioning the Cython limitation and link to that issue, so that when/if it gets resolved we can remove the hack.
Overall this is looking fine. Working around the Cython limitation isn't so bad, especially if we can remove the fused type hack after cython/cython#4712 is merged. Eventually I'll need to clean up the |
The one failing test (mac/py3.11) has a strange error message:
It doesn't seem to be related to this PR, though, so I'm +1 to merge now. |
Shall I restart the build or do we need to wait for another review for the PR to be merged? |
Let's wait a few more days to see if someone else wants to review, and if not I'll merge it then. |
No more comments, so I'll merge now. |
…#18192) * FIX trigger indptr/indices copy when data are copied in astype * TST add unit tests * iter * Add comment regarding object memview support
Reference issue
closes gh-8678
closes gh-8679
supersede gh-8679
What does this implement/fix?
In some cases, it happens that the internal arrays of the scipy matrices are memmaped in read-only mode (e.g. using
joblib.Parallel
). Callingastype
callssum_duplicates
(called within_deduped_data
) that intends to sort the indices of the matrix in place. It will therefore raise an error due to the read-only property.Here, I trigger a copy of
intptr
andindices
in the case thedtype
does not match.Additional information
As reported in the issue, this problem was reported several times in scikit-learn: scikit-learn/scikit-learn#15924, scikit-learn/scikit-learn#25935, etc.