update unflatten for NaNs, and add function flatten_preserve_lists #42

kaiaeberli · 2019-03-02T10:50:00Z

Summary

The pull request changes the unflatten implementation, so that it can unflatten json that contains NaN values. Such json is created when json is flattened and converted to a pandas DataFrame. The dataframe will interpolate missing column values with NaN. Upon unflattening, the function should take care to remove these NaNs again. This addresses issue #40 .

This pull request also introduces a new function, flatten_preserve_lists. The current flatten implementation collapses list structure so that everything becomes a single row. This implementation preserves list structure, collapsing only dictionaries.

It creates a new record for every element in a list, akin to a left join between two tables in SQL. It adds options max_list_index, which controls how many list indices are processed, and max_depth, which controls how many recursions are permitted. Given that the result of this function may be very large, these options help reduce output size and can be used for quick data investigation.

This function requires import of copy, re, and math libraries.
This addresses issue #43 .

Bug Fixes/New Features

change unflatten to consider NaN values correctly to undo pandas DataFrame interpolation of missing column values
introduce flatten_preserve_lists, to allow list structure to be preserved and only collapse dicts.
flatten_preserve_lists also pulls up single child elements to the parent level to prevent unnecessary nesting.

How to Verify

Added tests:

test_unflatten_with_df_issue40
test_flatten_preserve_lists_issue43
test_flatten_preserve_lists_issue43_nested

Side Effects

None to my knowledge.

Resolves

Fixes #40
Fixes #43

Tests

Added tests:

test_unflatten_with_df_issue40
test_flatten_preserve_lists_issue43
test_flatten_preserve_lists_issue43_nested
All tests are passing.

Code Reviewer(s)

@amirziai

Adding fix for unflattening from pandas dataframe records. See here: amirziai#40

added test for amirziai#40

merge patch-1 branch to master

pep8speaks · 2019-03-02T11:04:52Z

Hello @kaiaeberli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-04 05:03:44 UTC

…fore lists

…esolving by element length.

amirziai · 2019-03-26T20:38:28Z

thanks for the PR @kaiaeberli !

can u follow the pull request template and add more context?

kaiaeberli · 2019-04-04T21:52:57Z

@amirziai: sure - updated the pull request

amirziai · 2019-05-09T20:01:07Z

@kaiaeberli this is awesome. i'll approve and merge. sorry it took me so long to get around to it.

it'd be great if you could create another PR with an updated readme to include flatten_preserve_lists and your improvement to unflatten. i'm sure others can benefit from this contribution.

great work!

damtur · 2019-05-16T14:12:42Z

So that you know this release broke our workflow since we relied on not including empty lists and dictionaries in output. In my optinion this should be marked as a new feature, not patch version in Semver.
Thank you :)

amirziai · 2019-05-17T06:04:03Z

sorry about that @damtur , didn't think about the release carefully. hope that you can pin to the old version and have the workflow going again.

damtur · 2019-05-17T14:36:19Z

No problem @amirziai we have things working now. It's just a kind ask to pay more attention to SEMVER ;). Thank you so much! And also thank you for actually fixing this.

kaiaeberli added 5 commits March 2, 2019 10:16

Fix unflatten for dataframe with duplicate columns

1f8c611

Adding fix for unflattening from pandas dataframe records. See here: amirziai#40

added test for issue 40

2dd4fdb

added test for amirziai#40

fixed comma

e2524b2

fixed to use startswidth

ec65edb

Merge pull request #1 from kaiaeberli/patch-1

b2a334f

merge patch-1 branch to master

kaiaeberli changed the title ~~adding tests for https://github.com/amirziai/flatten/issues/40~~ adding code and tests for https://github.com/amirziai/flatten/issues/40 Mar 2, 2019

kaiaeberli added 8 commits March 2, 2019 11:16

fixed test

3ebe6f5

fixed whitespace

815e640

whitespace

e0d98cf

ws

60eb6ff

whitespace

bcd213c

pep errors

9e4d29c

pep

c867592

added flatten_preserver_lists function

6cf29f6

kaiaeberli changed the title ~~adding code and tests for https://github.com/amirziai/flatten/issues/40~~ adding code and tests for https://github.com/amirziai/flatten/issues/40 and https://github.com/amirziai/flatten/issues/43 Mar 4, 2019

kaiaeberli changed the title ~~adding code and tests for https://github.com/amirziai/flatten/issues/40 and https://github.com/amirziai/flatten/issues/43~~ adding code and tests for issues #40 and #43 Mar 4, 2019

kaiaeberli added 12 commits March 5, 2019 01:04

Update test_flatten.py

b76306b

pep8

3c5c9a7

Update flatten_json.py

7b16286

Update flatten_json.py

139e486

added flatten_preserve_lists test

0c2bd75

Update test_flatten.py

ce85c3a

added sorted to make dict python 2 compatible

bcbf0c6

fixed sorting for dictionaries, so that simple types are processed be…

24ff42b

…fore lists

tie resolution for dictionaries where values are all the same type. R…

a71a8a6

…esolving by element length.

added more tests for flatten_preserve_lists for nested lists

ba08b9f

pep8

1ac6062

pep8

3c868c1

Merge branch 'master' into master

c21ae32

kaiaeberli changed the title ~~adding code and tests for issues #40 and #43~~ update unflatten for NaNs, and add function flatten_preserve_lists Apr 4, 2019

amirziai approved these changes May 9, 2019

View reviewed changes

amirziai merged commit e48fe49 into amirziai:master May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update unflatten for NaNs, and add function flatten_preserve_lists #42

update unflatten for NaNs, and add function flatten_preserve_lists #42

kaiaeberli commented Mar 2, 2019 •

edited

Loading

pep8speaks commented Mar 2, 2019 •

edited

Loading

amirziai commented Mar 26, 2019

kaiaeberli commented Apr 4, 2019

amirziai commented May 9, 2019

damtur commented May 16, 2019

amirziai commented May 17, 2019

damtur commented May 17, 2019

update unflatten for NaNs, and add function flatten_preserve_lists #42

update unflatten for NaNs, and add function flatten_preserve_lists #42

Conversation

kaiaeberli commented Mar 2, 2019 • edited Loading

Summary

Bug Fixes/New Features

How to Verify

Side Effects

Resolves

Tests

Code Reviewer(s)

pep8speaks commented Mar 2, 2019 • edited Loading

Comment last updated at 2019-04-04 05:03:44 UTC

amirziai commented Mar 26, 2019

kaiaeberli commented Apr 4, 2019

amirziai commented May 9, 2019

damtur commented May 16, 2019

amirziai commented May 17, 2019

damtur commented May 17, 2019

kaiaeberli commented Mar 2, 2019 •

edited

Loading

pep8speaks commented Mar 2, 2019 •

edited

Loading