-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: tests for GH4862, GH7401, GH7403, GH7405 #9292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12328,6 +12328,25 @@ def test_unstack_dtypes(self): | |
expected = Series({'float64' : 2, 'object' : 2}) | ||
assert_series_equal(result, expected) | ||
|
||
# GH7405 | ||
for c, d in (np.zeros(5), np.zeros(5)), \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you specify dtypes to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i do not think |
||
(np.arange(5, dtype='f8'), np.arange(5, 10, dtype='f8')): | ||
|
||
df = DataFrame({'A': ['a']*5, 'C':c, 'D':d, | ||
'B':pd.date_range('2012-01-01', periods=5)}) | ||
|
||
right = df.iloc[:3].copy(deep=True) | ||
|
||
df = df.set_index(['A', 'B']) | ||
df['D'] = df['D'].astype('int64') | ||
|
||
left = df.iloc[:3].unstack(0) | ||
right = right.set_index(['A', 'B']).unstack(0) | ||
right[('D', 'a')] = right[('D', 'a')].astype('int64') | ||
|
||
self.assertEqual(left.shape, (3, 2)) | ||
tm.assert_frame_equal(left, right) | ||
|
||
def test_unstack_non_unique_index_names(self): | ||
idx = MultiIndex.from_tuples([('a', 'b'), ('c', 'd')], | ||
names=['c1', 'c1']) | ||
|
@@ -12385,6 +12404,93 @@ def verify(df): | |
for col in ['4th', '5th']: | ||
verify(udf[col]) | ||
|
||
# GH7403 | ||
df = pd.DataFrame({'A': list('aaaabbbb'),'B':range(8), 'C':range(8)}) | ||
df.iloc[3, 1] = np.NaN | ||
left = df.set_index(['A', 'B']).unstack(0) | ||
|
||
vals = [[3, 0, 1, 2, nan, nan, nan, nan], | ||
[nan, nan, nan, nan, 4, 5, 6, 7]] | ||
vals = list(map(list, zip(*vals))) | ||
idx = Index([nan, 0, 1, 2, 4, 5, 6, 7], name='B') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may not be obvious (at least it wasn't to me), but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I actually explicitly cast to objects not float64. the index comes out as float because of type inference in
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you have to specify a dtype=object when constructing an index to not have it coerce (if u want it to explicitly preserve it) if u really mean integer then we use -1 as a sentinel in the indexers and they are left as ints There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @behzadnouri, yes, precisely, here the conversion of ints to foats is done by @jreback, understood. Two points: First, other than specifying
Second, my point is that in the course of stacking and/or unstacking (and perhaps other operations), a single level in a multi-level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @behzadnouri, just to clarify, I don't doubt that within the unstacking code the conversion from ints to floats happens in The reason I think this is a potential problem (which predates you, I'm sure, and is also present in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @seth-p I think the unstack/stack code need to explicity pass the dtype into when the Index is constructed. Currently you cannot have nans in an Int64Index, so unless you mark it as object it will by definition be coerced. A multi-index can represent this, but single level multi-indexes are not supported as that is just added complexity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The question is what you want the behavior to be. Suppose
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On a related note, to see in general how messed up things are when have
Why does |
||
cols = MultiIndex(levels=[['C'], ['a', 'b']], | ||
labels=[[0, 0], [0, 1]], | ||
names=[None, 'A']) | ||
|
||
right = DataFrame(vals, columns=cols, index=idx) | ||
assert_frame_equal(left, right) | ||
|
||
df = DataFrame({'A': list('aaaabbbb'), 'B':list(range(4))*2, | ||
'C':range(8)}) | ||
df.iloc[2,1] = np.NaN | ||
left = df.set_index(['A', 'B']).unstack(0) | ||
|
||
vals = [[2, nan], [0, 4], [1, 5], [nan, 6], [3, 7]] | ||
cols = MultiIndex(levels=[['C'], ['a', 'b']], | ||
labels=[[0, 0], [0, 1]], | ||
names=[None, 'A']) | ||
idx = Index([nan, 0, 1, 2, 3], name='B') | ||
right = DataFrame(vals, columns=cols, index=idx) | ||
assert_frame_equal(left, right) | ||
|
||
df = pd.DataFrame({'A': list('aaaabbbb'),'B':list(range(4))*2, | ||
'C':range(8)}) | ||
df.iloc[3,1] = np.NaN | ||
left = df.set_index(['A', 'B']).unstack(0) | ||
|
||
vals = [[3, nan], [0, 4], [1, 5], [2, 6], [nan, 7]] | ||
cols = MultiIndex(levels=[['C'], ['a', 'b']], | ||
labels=[[0, 0], [0, 1]], | ||
names=[None, 'A']) | ||
idx = Index([nan, 0, 1, 2, 3], name='B') | ||
right = DataFrame(vals, columns=cols, index=idx) | ||
assert_frame_equal(left, right) | ||
|
||
# GH7401 | ||
df = pd.DataFrame({'A': list('aaaaabbbbb'), 'C':np.arange(10), | ||
'B':date_range('2012-01-01', periods=5).tolist()*2 }) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add addtl test that has a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is tested in |
||
|
||
df.iloc[3,1] = np.NaN | ||
left = df.set_index(['A', 'B']).unstack() | ||
|
||
vals = np.array([[3, 0, 1, 2, nan, 4], [nan, 5, 6, 7, 8, 9]]) | ||
idx = Index(['a', 'b'], name='A') | ||
cols = MultiIndex(levels=[['C'], date_range('2012-01-01', periods=5)], | ||
labels=[[0, 0, 0, 0, 0, 0], [-1, 0, 1, 2, 3, 4]], | ||
names=[None, 'B']) | ||
|
||
right = DataFrame(vals, columns=cols, index=idx) | ||
assert_frame_equal(left, right) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here both I noticed this because when testing my code in https://github.com/pydata/pandas/pull/9023/files, when I change the implementation of |
||
|
||
# GH4862 | ||
vals = [['Hg', nan, nan, 680585148], | ||
['U', 0.0, nan, 680585148], | ||
['Pb', 7.07e-06, nan, 680585148], | ||
['Sn', 2.3614e-05, 0.0133, 680607017], | ||
['Ag', 0.0, 0.0133, 680607017], | ||
['Hg', -0.00015, 0.0133, 680607017]] | ||
df = DataFrame(vals, columns=['agent', 'change', 'dosage', 's_id'], | ||
index=[17263, 17264, 17265, 17266, 17267, 17268]) | ||
|
||
left = df.copy().set_index(['s_id','dosage','agent']).unstack() | ||
|
||
vals = [[nan, nan, 7.07e-06, nan, 0.0], | ||
[0.0, -0.00015, nan, 2.3614e-05, nan]] | ||
|
||
idx = MultiIndex(levels=[[680585148, 680607017], [0.0133]], | ||
labels=[[0, 1], [-1, 0]], | ||
names=['s_id', 'dosage']) | ||
|
||
cols = MultiIndex(levels=[['change'], ['Ag', 'Hg', 'Pb', 'Sn', 'U']], | ||
labels=[[0, 0, 0, 0, 0], [0, 1, 2, 3, 4]], | ||
names=[None, 'agent']) | ||
|
||
right = DataFrame(vals, columns=cols, index=idx) | ||
assert_frame_equal(left, right) | ||
|
||
left = df.ix[17264:].copy().set_index(['s_id','dosage','agent']) | ||
assert_frame_equal(left.unstack(), right) | ||
|
||
def test_stack_datetime_column_multiIndex(self): | ||
# GH 8039 | ||
t = datetime(2014, 1, 1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_make_new_index(lev, lab)
doesn't seem to handle the dtype properly for aDatetimeIndex
whenlab
contains-1
:I need this in order to get
stack()
to work properly withNaN
/NaT
in indices for #9023.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seth-p I think this last is prob a bug, pls open a new report (with your example above is good). and we can fix independently (and you can rebase on top)
on 2nd thought, you can do as part of #9023 as this is an internal method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can fix it I will (as part of #9023), otherwise will punt back to @behzadnouri. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think adding the following code just before the
try
fixes_make_new_index(lev, lab)
. I will include it in #9023.