You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some estimators such as Support Vector Classifiers (SVC) store some data (e.g. the support vectors) as scipy.sparse._csr.csr_matrix. This is not obvious from the documentation where it says that the support vectors should be stored as np.ndarrays.
This becomes an issue because scipy representations of sparse matrices are not supported types and are filtered out before serialization, raising and exception when deserializing the object.
Possible solutions
I have been thinking of two possible solutions for this (better alternatives are most welcome 😄)
Option 1:
Use the default function toarray() to transform the csr_matrix into a np.ndarray.
Pros: implementation is quite straight forward. The resulting np.ndarray can be directly transformed into a list through the SklearnSerializer._array_to_list() function.
Cons: this would potentially increase the size of the files because we might be storing sparce matrices as dense ones. Also, I think that this would make maintainability difficult because it requires that we know which attributes of which estimators will produce a csr_matrix so that we can transformed it back from dense to sparse during deserialization.
Option 2:
We could transform the sparse matrix into dict, and then serialize the dict. Below I have placed an example for a csc_matrix:
# Define the matrix in dense formatdense_matrix=np.array([
[0, 10, 0, 0],
[0, 0, 20, 0],
[0, 0, 30, 40],
[50, 60, 0, 0]
])
# Create a CSC matrix from the dense matrixcsc_mat=csc_matrix(dense_matrix)
During serialization we can transform the csc_matrix into a dict and store it as a JSON.
During deserialization we can read in the JSON and transform it back into a csc_matrix
Pros: it will reduce the size of the JSON files specially when large sparse matrices are generated. I also think it will be a bit easier to deserialize because the attributes stored as sparce matrices will be flag by definition in the dictionary (in the datatype).
Cons: it requires to rewrite some additional functions convert from sparse to dictionary and from dictionary to sparse.
The text was updated successfully, but these errors were encountered:
Description of the Issue
Some estimators such as Support Vector Classifiers (
SVC
) store some data (e.g. the support vectors) asscipy.sparse._csr.csr_matrix
. This is not obvious from the documentation where it says that the support vectors should be stored asnp.ndarrays
.This becomes an issue because
scipy
representations of sparse matrices are not supported types and are filtered out before serialization, raising and exception when deserializing the object.Possible solutions
I have been thinking of two possible solutions for this (better alternatives are most welcome 😄)
Option 1:
Use the default function
toarray()
to transform thecsr_matrix
into anp.ndarray
.np.ndarray
can be directly transformed into a list through theSklearnSerializer._array_to_list()
function.csr_matrix
so that we can transformed it back from dense to sparse during deserialization.Option 2:
We could transform the sparse matrix into
dict
, and then serialize thedict
. Below I have placed an example for acsc_matrix
:During serialization we can transform the
csc_matrix
into adict
and store it as a JSON.During deserialization we can read in the JSON and transform it back into a
csc_matrix
datatype
).The text was updated successfully, but these errors were encountered: