Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update unflatten for NaNs, and add function flatten_preserve_lists #42

Merged
merged 26 commits into from
May 9, 2019

Conversation

kaiaeberli
Copy link
Contributor

@kaiaeberli kaiaeberli commented Mar 2, 2019

Summary

The pull request changes the unflatten implementation, so that it can unflatten json that contains NaN values. Such json is created when json is flattened and converted to a pandas DataFrame. The dataframe will interpolate missing column values with NaN. Upon unflattening, the function should take care to remove these NaNs again. This addresses issue #40 .

This pull request also introduces a new function, flatten_preserve_lists. The current flatten implementation collapses list structure so that everything becomes a single row. This implementation preserves list structure, collapsing only dictionaries.

It creates a new record for every element in a list, akin to a left join between two tables in SQL. It adds options max_list_index, which controls how many list indices are processed, and max_depth, which controls how many recursions are permitted. Given that the result of this function may be very large, these options help reduce output size and can be used for quick data investigation.

This function requires import of copy, re, and math libraries.
This addresses issue #43 .

Bug Fixes/New Features

  • change unflatten to consider NaN values correctly to undo pandas DataFrame interpolation of missing column values
  • introduce flatten_preserve_lists, to allow list structure to be preserved and only collapse dicts.
  • flatten_preserve_lists also pulls up single child elements to the parent level to prevent unnecessary nesting.

How to Verify

Added tests:

  • test_unflatten_with_df_issue40
  • test_flatten_preserve_lists_issue43
  • test_flatten_preserve_lists_issue43_nested

Side Effects

None to my knowledge.

Resolves

Fixes #40
Fixes #43

Tests

Added tests:

  • test_unflatten_with_df_issue40
  • test_flatten_preserve_lists_issue43
  • test_flatten_preserve_lists_issue43_nested
    All tests are passing.

Code Reviewer(s)

@amirziai

@pep8speaks
Copy link

pep8speaks commented Mar 2, 2019

Hello @kaiaeberli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-04 05:03:44 UTC

@kaiaeberli kaiaeberli changed the title adding tests for https://github.com/amirziai/flatten/issues/40 adding code and tests for https://github.com/amirziai/flatten/issues/40 Mar 2, 2019
@kaiaeberli kaiaeberli changed the title adding code and tests for https://github.com/amirziai/flatten/issues/40 adding code and tests for https://github.com/amirziai/flatten/issues/40 and https://github.com/amirziai/flatten/issues/43 Mar 4, 2019
@kaiaeberli kaiaeberli changed the title adding code and tests for https://github.com/amirziai/flatten/issues/40 and https://github.com/amirziai/flatten/issues/43 adding code and tests for issues #40 and #43 Mar 4, 2019
@amirziai
Copy link
Owner

thanks for the PR @kaiaeberli !

can u follow the pull request template and add more context?

@kaiaeberli
Copy link
Contributor Author

@amirziai: sure - updated the pull request

@kaiaeberli kaiaeberli changed the title adding code and tests for issues #40 and #43 update unflatten for NaNs, and add function flatten_preserve_lists Apr 4, 2019
@amirziai
Copy link
Owner

amirziai commented May 9, 2019

@kaiaeberli this is awesome. i'll approve and merge. sorry it took me so long to get around to it.

it'd be great if you could create another PR with an updated readme to include flatten_preserve_lists and your improvement to unflatten. i'm sure others can benefit from this contribution.

great work!

@amirziai amirziai merged commit e48fe49 into amirziai:master May 9, 2019
@damtur
Copy link

damtur commented May 16, 2019

So that you know this release broke our workflow since we relied on not including empty lists and dictionaries in output. In my optinion this should be marked as a new feature, not patch version in Semver.
Thank you :)

@amirziai
Copy link
Owner

sorry about that @damtur , didn't think about the release carefully. hope that you can pin to the old version and have the workflow going again.

@damtur
Copy link

damtur commented May 17, 2019

No problem @amirziai we have things working now. It's just a kind ask to pay more attention to SEMVER ;). Thank you so much! And also thank you for actually fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants