-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
from_dict(..., orient='index') row order preservation inconsistent #24859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The difference is caused by the code branching to _from_nested_dict. As far as I can tell this bug can be resolved by removing the branch condition |
I imagine that completely removing that branch would break existing tests, but investigation and potentially patching |
Indeed the change break one existing test, pandas.tests.frame.test_constructors.TestDataFrameConstructors::test_constructor_list_of_series Compare the following: # both columns and index are not sorted
example = {'B' : {'Y':1,'X':2}, 'A' : {'Y':3,'X':4}}
# Index is sorted, columns not
print(pd.DataFrame.from_dict(example, orient='index'))
# Y X
# A 3 4
# B 1 2
# what would happen if we don't use _from_nested_dict
# Columns are sorted, index not
data, index = list(example.values()), list(example.keys())
print(pd.DataFrame(data, index=index))
# X Y
# B 2 1
# A 4 3 |
I think we've had quite a few similar conversations about this recently but generally always come back to the point that it's arguably impossible to guarantee dict insertion order when dealing with more than one dimension. It's also not something that gets guaranteed by specifications that allow things like this, like JSON Closing this out as such, though feel free to ping if you strongly disagree |
This also threw me off for a while. Especially because in lots of other cases insertion order is maintained (e.g. across columns) a workaround seems to be the following (assuming your dict preserves insertion order in e.g. py>=3.6): data = {"B": dict(col1=1), "A": dict(col1=2), "C": dict(col1=3)}
# 1st Option (fails - keys are sorted):
# df = pd.DataFrame.from_dict(data, orient="index")
# 2nd Option (works but seems overly verbose to type this out each time):
# df = pd.DataFrame.from_records(list(data.values()), index=list(data.keys()))
# 3rd Option (nice and concise):
df = pd.DataFrame(data).T # use the fact that insertion order is maintained across the column and then transpose it
print(df)
# col1
# B 1
# A 2
# C 3 |
Code Sample
Problem description
If dictionaries are passed for the column values in a call to
pd.DataFrame.from_dict(data, orient='index')
, then the df index is sorted (Not Expected). If the column values are lists, then the index is not sorted (Expected)Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.1
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: 4.6.3
html5lib: None
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: