Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_table returns a Series instead of a DataFrame depending upon the datatype of the values parameter #4371

Closed
vijaymukhi712 opened this issue Jul 26, 2013 · 5 comments

Comments

@vijaymukhi712
Copy link

My python code looks like

df = pandas.read_csv("a.csv")
print df
table = pandas.pivot_table(df, values=['Sales1'] ,rows='State' , aggfunc=np.sum)
print type(table)
table = pandas.pivot_table(df, values='Sales1' ,rows='State' , aggfunc=np.sum)
print type(table)

Contents of a.csv
City,Sales1
Mumbai,1

The Output
City  Sales1
0  Mumbai       1
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

My question is why would a change in the datatype of the values parameter return a DataFrame instead of a Series. After all we use an array only if we have to pass multiple values.

Thank You.

Vijay Mukhi

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

Hi @vijaymukhi712,

Thanks for the report. Can you please post a minimal reproducible example, meaning that anyone who wants to could load your code into Python and read in a data set that illustrates the issue? It doesn't have to be the data set you found the issue with, but that's okay if you want too.

The issue will get resolved much faster if you do this.

Thanks.

@vijaymukhi712
Copy link
Author

I am sorry for not being clear earlier. I have created the smallest csv, a.csv file that shows the anomaly. The data set that I originally worked with was over 80 MB, so no point posting it. As the example above shows why would changing the data type of the values parameter from a string to an array change the data type of the returned object from a Series to a DataFrame. That is the basic question I am asking. My example works with any data set that has two columns and a minimum of one row.

@TomAugspurger
Copy link
Contributor

@vijaymukhi712 , when @cpcloud asked for a minimal reproducible example, something like df = pd.DataFrame({'A': [1, 2, 3, 4], 'State': ['IA', 'NY', 'NY', 'CA']}) would be fantastic. Just something that can be copy-pasted directly into the interpreter.

I think that values=['Sales1'] returning a DataFrame while values='Sales1' returns a Series is the intended behavior. ['Sales1'] is passing a collection of objects, which must be represented by a DataFrame. It just happens that the DataFrame only has one column since the collection only had one item. You'll notice the same thing when selecting subsets of columns of a DataFrame.

In [13]: type(df['State'])
Out[13]: pandas.core.series.Series

In [14]: type(df[['State']])
Out[14]: pandas.core.frame.DataFrame

Here the outer square brackets is the getitem, and in the second case the inner square brackets create a list with one element.

So you're assumption here

After all we use an array only if we have to pass multiple values.

was faulty. Sometimes we do pass arrays with single values.

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

@TomAugspurger nice explanation

@vijaymukhi712
Copy link
Author

Tom, thanks for the insight, never looked at that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants