-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Integer column index breaks json roundtrip with orient=table #46392
Comments
I'm willing to tackle this, but I'm not sure of the best approach:
|
I can add my user perspective: The use case I have is to serialize a |
Playing around I found another issue.
The
By serializing and deserializing we get back
Column types also get converted from strings to ints:
|
The issue is in: pandas/pandas/core/internals/construction.py Line 955 in 1b2646a
Called by: pandas/pandas/io/json/_table_schema.py Line 353 in 1b2646a
I'm interested in helping more, do you where can I look further @mroeschke? |
Not too familiar with this code path, so I'm not exactly sure off the top of my head how |
I found the issue and have a fix, but the code I am touching is ~10 years old according to Gitlens git blame. Either way, the problem is that when reading the JSON, pandas will have an array of dicts like so: {"ID":110,"1":1.0,"2":2.1} Since JSON only supports strings as keys, we have an issue here, because we don't have a 1:1 mapping (https://stackoverflow.com/questions/1450957/pythons-json-module-converts-int-dictionary-keys-to-strings). In the following code, the row will be a dictionary with strings as keys, and columns (and k) will be the proper column array. Lines 432 to 455 in 1b2646a
Meaning there is a mismatch as I've tested the following fix: - col = columns[j]
+ col = str(columns[j]) And it solves this issue, I'm currently running the test suite to check if nothing else got broken. |
I've finished running the test suite, before and after, this change seems to 4 introduce failures. Is there a way I can check the summary of both runs and diff between them, even if manually? |
@jmg-duarte did you end up solving the issue? If not, @mroeschke could you suggest a way for @jmg-duarte to diff between summary runs? It's not ideal that the JSON being generated is invalid. |
@coatless I discussed a potential fix in #46392 (comment) but got no response as you can see :/ |
Hi @jmg-duarte, @coatless and @mroeschke . Do you know of any updates to this issue? It seems @jmg-duarte has identified the cause and proposed a solution with a small code change. Is there a way forward, or will this potentially be solved by another PR that you know of? |
Relies on @mroeschke & pandas team. Nothing has changed AFAIK. |
Thanks for your reply @mroeschke ! |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The
new
dataframe will becomeExpected Behavior
The expected dataframe would look like this:
Changing to strings instead of integers in the column index will give the expected result:
Installed Versions
This crashed in my environment with the error
assert '_distutils' in core.__file__, core.__file__
raised fromlib/python3.9/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
The text was updated successfully, but these errors were encountered: