-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask_ml.decomposition.PCA: ValueError with data > 1 TB #592
Comments
Do you know why the array is readonly? I don't immediately see a reason why the size of the data would matter, but I may be missing something. |
There may have been some Cython thing at some point. I can't remember who brought this up originally. @jakirkham were you involved in this? |
I also don't understand why the array is readonly. |
I don't have access to an HPC machine. dask/distributed#1978 does sound related. Does ensuring that all your dependencies are built against Cython 0.28 or newer fix things? dask/distributed#1978 (comment) is using PCA as well. Let's continue the discussion over there. |
IDK about involved. 😉 We did discuss a similar issue before that Tom has referenced. |
Because we send In [1]: memoryview(b"abc").readonly
Out[1]: True |
When I do
dask_ml.decomposition.PCA().fit(x)
, where the arrayx
has a size > 1 TB, I get the errorValueError: output array is read-only
.I use
The script
gives the error
Note that
x = da.random.random((1000000, 130000), chunks=(100000, 2000))
(1.0 TB), the error does not appear.extmath.py
by changingI think this is not a good fix because I assume that the array
v
is blocked by another function.Is there another way to fix the error?
The text was updated successfully, but these errors were encountered: