-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Made SparseDataFrame.fillna() fill all NaNs #16178
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1252,6 +1252,7 @@ def test_from_scipy_correct_ordering(spmatrix): | |
tm.skip_if_no_package('scipy') | ||
|
||
arr = np.arange(1, 5).reshape(2, 2) | ||
|
||
try: | ||
spm = spmatrix(arr) | ||
assert spm.dtype == arr.dtype | ||
|
@@ -1267,6 +1268,33 @@ def test_from_scipy_correct_ordering(spmatrix): | |
tm.assert_frame_equal(sdf.to_dense(), expected.to_dense()) | ||
|
||
|
||
def test_from_scipy_object_fillna(spmatrix): | ||
# GH 16112 | ||
tm.skip_if_no_package('scipy', max_version='0.19.0') | ||
|
||
arr = np.eye(3) | ||
arr[1:, 0] = np.nan | ||
|
||
try: | ||
spm = spmatrix(arr) | ||
assert spm.dtype == arr.dtype | ||
except (TypeError, AssertionError): | ||
# If conversion to sparse fails for this spmatrix type and arr.dtype, | ||
# then the combination is not currently supported in NumPy, so we | ||
# can just skip testing it thoroughly | ||
return | ||
|
||
sdf = pd.SparseDataFrame(spm).fillna(-1.0) | ||
|
||
# Returning frame should fill all nan values with -1.0 | ||
expected = pd.SparseDataFrame({0: {0: 1.0, 1: np.nan, 2: np.nan}, | ||
1: {0: np.nan, 1: 1.0, 2: np.nan}, | ||
2: {0: np.nan, 1: np.nan, 2: 1.0}} | ||
).fillna(-1.0) | ||
|
||
tm.assert_frame_equal(sdf.to_dense(), expected.to_dense()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you need to compare the sparse version here (and dense also compared is fine too) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be an inconsistency in the way the BlockIndexes are calculated in the initialization from the sparse matrix and from a dict resulting in different block lengths. Should I file this as another issue and xfail the sparse version comparisons for now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But when you compare the two dense frames, does the discrepancy still matter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the comparison between the dense frames works fine. As far as I observed, the only difference is the block lengths attribute of the BlockIndexes in the SparseDataFrames. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps because in the dict case, the column 0 of sparse frame is inited with three values: 1, nan, nan. Whereas in case of spmatrix, the block index starts at 0, has length 1, and points to the one 1.0. Which one is correct, I wouldn't dare say. Does passing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, even after passing a default_fill_value, the same error still persists. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, does it work if you instead init the expected frame like so: expected = pd.SparseDataFrame(
{0: {0: 1.0, 1: np.nan, 2: np.nan},
1: {1: 1.0},
2: {2: 1.0}}).fillna(-1.0) I.e. without explicit NaNs. I guess NaNs as values are distinct from no values in sparse matrices. The alternative which would also work (but is obviously worse) is to init spmatrix like: arr = np.eye(3)
arr[1:, 0] = np.nan
arr[arr == 0] = np.nan
spm = spmatrix(arr) In either case, the positions of defined values vs. non-defined must match. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @keitakurita I can confirm above works. Do you intend to continue with this or do you allow me to take over, amending your commit, retaining your authorship info? I'd just really like to see this fixed. 😃 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kernc |
||
|
||
|
||
class TestSparseDataFrameArithmetic(object): | ||
|
||
def test_numeric_op_scalar(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value if self._null_fill_value else self.fill_value