-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update unflatten for NaNs, and add function flatten_preserve_lists #42
Conversation
Adding fix for unflattening from pandas dataframe records. See here: amirziai#40
added test for amirziai#40
merge patch-1 branch to master
Hello @kaiaeberli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-04-04 05:03:44 UTC |
…esolving by element length.
thanks for the PR @kaiaeberli ! can u follow the pull request template and add more context? |
@amirziai: sure - updated the pull request |
@kaiaeberli this is awesome. i'll approve and merge. sorry it took me so long to get around to it. it'd be great if you could create another PR with an updated readme to include great work! |
So that you know this release broke our workflow since we relied on not including empty lists and dictionaries in output. In my optinion this should be marked as a new feature, not patch version in Semver. |
sorry about that @damtur , didn't think about the release carefully. hope that you can pin to the old version and have the workflow going again. |
No problem @amirziai we have things working now. It's just a kind ask to pay more attention to SEMVER ;). Thank you so much! And also thank you for actually fixing this. |
Summary
The pull request changes the
unflatten
implementation, so that it can unflatten json that contains NaN values. Such json is created when json is flattened and converted to a pandas DataFrame. The dataframe will interpolate missing column values with NaN. Upon unflattening, the function should take care to remove these NaNs again. This addresses issue #40 .This pull request also introduces a new function,
flatten_preserve_lists
. The currentflatten
implementation collapses list structure so that everything becomes a single row. This implementation preserves list structure, collapsing only dictionaries.It creates a new record for every element in a list, akin to a left join between two tables in SQL. It adds options max_list_index, which controls how many list indices are processed, and max_depth, which controls how many recursions are permitted. Given that the result of this function may be very large, these options help reduce output size and can be used for quick data investigation.
This function requires import of copy, re, and math libraries.
This addresses issue #43 .
Bug Fixes/New Features
unflatten
to consider NaN values correctly to undo pandas DataFrame interpolation of missing column valuesflatten_preserve_lists
, to allow list structure to be preserved and only collapse dicts.flatten_preserve_lists
also pulls up single child elements to the parent level to prevent unnecessary nesting.How to Verify
Added tests:
Side Effects
None to my knowledge.
Resolves
Fixes #40
Fixes #43
Tests
Added tests:
All tests are passing.
Code Reviewer(s)
@amirziai