Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] to_json ignores index=True #11317

Closed
dagardner-nv opened this issue Jul 20, 2022 · 3 comments
Closed

[BUG] to_json ignores index=True #11317

dagardner-nv opened this issue Jul 20, 2022 · 3 comments
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue

Comments

@dagardner-nv
Copy link
Contributor

Describe the bug
Documentation for the index argument for cudf.DataFrame.to_json https://docs.rapids.ai/api/cudf/stable/api_docs/api/cudf.DataFrame.to_json.html
states:

indexbool, default True
Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

However this is ignored, and appears to be caused by Pandas as cudf's to_json uses Panda's to_json.
pandas-dev/pandas#37600

Steps/Code to reproduce bug
Issue can be reproduced in both cudf and pandas.

cudf repro:

import cudf

df = cudf.DataFrame([3,4,5,6])

# Including CSV output for comparison
print(df.to_csv(header=True, index=True))

print(df.to_json(index=True, orient="records"))

Yields this output:

,0
0,3
1,4
2,5
3,6

[{"0":3},{"0":4},{"0":5},{"0":6}]

Pandas repro:

import pandas

df = pandas.DataFrame([3,4,5,6])

# Including CSV output for comparison
print(df.to_csv(header=True, index=True))

print(df.to_json(index=True, orient="records"))

Expected behavior
Include index column when index=True

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda
@dagardner-nv dagardner-nv added Needs Triage Need team to review and classify bug Something isn't working labels Jul 20, 2022
dagardner-nv added a commit to dagardner-nv/Morpheus that referenced this issue Jul 21, 2022
…SV output, due to a known issue in cudf & pandas (rapidsai/cudf#11317 & pandas-dev/pandas#37600) this option has no effect on JSON output
ghost pushed a commit to nv-morpheus/Morpheus that referenced this issue Aug 8, 2022
* instructions for manually testing of Morpheus using Kafka. Adds a Kafka version for each of the four validation scripts in `scripts/validation`
* csv & json serializers now support an `include_index_col` flag to control exporting the Dataframe's index column. Note due to a limitation of cudf & pandas this has no impact on JSON:
  + pandas-dev/pandas#37600 
  + rapidsai/cudf#11317
* `morpheus.utils.logging` renamed to `morpheus.utils.logger` so that other modules in `morpheus.utils` can import the standard lib logging module.
* Comparison logic in the `ValidationStage` has been moved to it's own module `morpheus.utils.compare_df` so that the functionality can be used outside of the stage.


fixes #265

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Pete MacKinnon (https://github.com/pdmack)
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #290
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@GregoryKimball
Copy link
Contributor

Thank you for raising this issue. cudf.DataFrame.to_json currently uses a host-fallback to pandas, and it seem that the index argument is not correctly passed.

Seems like this could be related to other changes in #11780

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment cuIO cuIO issue and removed Needs Triage Need team to review and classify inactive-30d labels Oct 21, 2022
@galipremsagar
Copy link
Contributor

@dagardner-nv

This is expected behavior in pandas to_json, which is documented here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

Screen Shot 2022-11-01 at 5 03 28 PM

However if you really need index data in json, you will need to pass orient='index' instead, which is shown in the example right below:
Screen Shot 2022-11-01 at 5 04 21 PM

Both these example have been snapshotted from the to_json docs page

Hence closing this issue on cudf side as there is no action item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue
Projects
None yet
Development

No branches or pull requests

3 participants