Skip to content

json_normalize should supply empty columns if record_path are not present #21830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wujiayikelly opened this issue Jul 9, 2018 · 9 comments
Closed
Labels
IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue

Comments

@wujiayikelly
Copy link

Code Sample, a copy-pastable example if possible

df = pd.read_json('filename')
json_normalize(df, record_path = ['data'], \
                         meta = ['id', 'desc'], errors = 'ignore')

Problem description

suppose json file, in some line, doesn't have anything to be normalized on data but does have id and desc information, no-empty. The function should be able to normalize that to empty columns while keep id and desc in the final result.

I am not sure whether this mean "ignore" but I don't think excluding the lines like that from the results is a good choice. I would recommend to supply with empty columns is not all columns are missing, both for record_path and meta.

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue labels Jul 9, 2018
@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2018

There's a good chance this was solved by #20399 but there's not enough information to confirm. Please try on the latest release and if that doesn't work re-open with all of the required information here:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#bug-reports-and-enhancement-requests

Along with a mininmally reproducible example:

http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@WillAyd WillAyd closed this as completed Jul 9, 2018
@wujiayikelly
Copy link
Author

Based on #20030, I made up another example about my case, as follows.

My problem is that the key of record_path is missing (not the value) while the keys of meta do exist. In such case, could we include the missing key of record_path as an empty column?

data_fail_to_normalize = [{'desc': 'CPU', 'id': 'e030'}] data_success_to_normalize = \ [{'desc': 'Read', 'id': '000e', 'data': [{ "index": 1, "type": "PM"}]

The examples I made up actually come from one dataframe. Each row is a dictionary. So when I loop through rows, there will be failure case for such scenario.

@wujiayikelly
Copy link
Author

@WillAyd Please find my example as above. Thank you!

@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2018

I'm still not clear on what you are looking for. Can you update to show expectations? The following works:

In [8]: json_normalize(json.loads('[{"desc": "CPU", "id": "e030"}]'))
Out[8]: 
  desc    id
0  CPU  e030

Using record_path='data' with the above wouldn't make sense as that doesn't exist and it raises an error accordingly.

@wujiayikelly
Copy link
Author

wujiayikelly commented Jul 9, 2018

df = [{'desc': 'Read', 'id': '000e', 'data': [{"index": 1, "type": "PM"}]}, {'desc': 'CPU', 'id': 'e030'} ] output1 = json_normalize(df, record_path = ['data'], meta = ['id', 'desc']) output2 = json_normalize(df)
 

  index type id desc
0 1 PM 000e Read
1 nan nan e030 CPU

Although 'data' doesn't exist for the second record but it does exist in the first record. Obviously, they come from same source. The key is missing for the second one.

Hopefully, this time it is clearer.

@WillAyd

@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2018

I get what you are saying but I don't think this is desired behavior. If a user specifics that a particular record path exists but it does not we raise an error, which is preferable to arbitrarily assuming that all users want to continue on in that case.

For your need you should use a try...except and handle the lack of a record_path as desired

@wujiayikelly
Copy link
Author

I see. Thanks for the suggestion!

@sann05
Copy link

sann05 commented Nov 11, 2020

The example below always generate an error TypeError: 'NoneType' object is not subscriptable as metadata object is not always present.

file = [{'values': ['1', '2'], 'metadata': {'name': 'first_value'}},
        {'values': ['3', '4'], 'metadata': None}]

df = json_normalize(file,
                    record_path='values',
                    meta=[['metadata', 'name']],
                    errors='ignore')

The metadata object is dynamic and nested, so I am concerned why it doesn't generate something like this:

0 metadata.name
1 first_value
2 first_value
3 nan
4 nan

as I set errors='ignore'

@WillAyd Could you please explain the best way to handle this?

@averhagen
Copy link

averhagen commented Apr 29, 2021

@WillAyd
"I don't think this is desired behavior"

I disagree, it is often the case that this function is being used on a JSON http response, and it is often the case that http responses are dynamic and exclude data.

A simple way to please everyone would be adding a parameter to the function: "fail_on_missing_record" which defaults to True. If set to False the normalizing function could create rows that only contain meta data for JSON objects where the record_path is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants