-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Using assign to place values from a dict into an empty dataframe adds the column names, but no values #17847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The fact that This said, as a consequence of "fixing" #16823 (I respectfully disagree on the fact that it was a bug in the first place), your code will now raise an error in the git version of pandas (but why not forbid Notice your code calls twice
results in
which is correct. So I think this can be closed. |
Ok, it seems the code I sent (even with your typo correction) failed to reproduce the issue (despite being my actual code with renamed variables) as I didn't have the right data types.
The issue was that 'D' and 'E' were actually vectors of length 1, not scalars. This means that the assign "just works". The output of the above is:
and the output of
Where here it has put spaces such that the 3 and 4 are perfectly aligned beneath B and C not D and E, which is what threw me (and the script that was reading the file) off. If we change the output to
Which makes it obvious that the NaNs were coming out as empty strings. So the behaviour of the second assign, with the length one vectors was actually exactly as expected and as desired. So I think in effect the thing I actually want is that pandas would treat scalars as vectors of length 1 (doing whatever index stuff was necessary for the user), which would mean that the behaviour of this and #16823 would end up with a data frame with 1 row once the scalar data had been added. I feel that treating a scalar the same as a length-1 vector is unambiguous and desirable behaviour, but obviously I'm not a dev on this project and I haven't reviewed the previous 17846 issues to find a reason why this is not the case. |
I'm afraid we all feel differently :-) But incidentally, a |
Ok, well thank you for taking a look at the issue. I suppose I will just have to be very careful and explicit in the future. As a matter of interest, is there an explanation written anywhere on why there is a decision (perhaps as a consequence of some overarching principle) to make scalars behave very differently to length-1 vectors? |
closing as this is a usage error |
If you mean "in general"... then there is simply no reason why they should be equal, in Maths as in programming... can't think of a specific reference, but anyway this is something pandas inherits from numpy. |
Yes, ok they are different types and so we can expect nothing, but I disagree mathematically as a scalar, a 1x1 matrix and a Rank-1 (0?) Tensor are all equivalent. If the answer is that numpy does this and consistency with numpy is a critical concern then that is the end of it. My feeling on this is that turning a dict with scalar elements into a pandas dataframe with one row has a single, unambiguous meaning, and that this is a useful feature. This is also the way it works in R:
and I would be baffled if the above R code was equivalent to:
Really, I'm not arguing that it should be defined for scalars as a direct consequence of being defined for vectors, but that there is a good, and useful, definition of many operations for scalar values. |
Uhm... the underlying set is the same, the set of operations you define on it differ. But yeah, right, not a particularly enlightening comparison. Anyway, coherence with |
I have similar disagreements with the behaviour of The workaround you suggest of checking and converting all elements in the dictionary to lists probably requires as much code as constructing a new DataFrame from the dict and using pd.concat() |
The result of
print(summary)
is
This occurs without error or warning messages.
If this had been done to a dataframe with some data already in a column called 'A' then the result would have been a data frame with:
I believe that assign should work consistently whether or not there is already data in the dataframe, creating an index if necessary. The current behaviour to add columns, but not to add any data rows, and not raise an exception or warning allows problems to occur silently.
I would expect the outcome of assigning myData to an empty dataframe to be:
This problem is worse than it initially appears to be, as operations adding more data to the "empty" dataframe (with some columns) via assign will succeed.
Once written out to file the dataframe will have less data than it has columns
summary.to_csv("summary.txt", sep=" ", header=True, index_label='rep')
gives as summary.txt
Which is clearly malformed.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-126-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.20.3
pytest: None
pip: 1.5.4
setuptools: 3.3
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.13.3
xarray: None
IPython: 1.2.1
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.6.2
feather: None
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: