Join instances #261

reza1615 · 2020-09-04T11:15:51Z

It would be helpful to have Join two or multiple instances. My suggestion:

The join window can be similar to description window
when user select a column to join shows max, min, number of empty or nan, number of unique and duplicate values, datatype of that column
before applying join based on joining type, show number of expected rows in the merged df

For example there is df1 and df2
df1.shape == (100,3)
df2.shape== (30,5)
df1.A <---- join to ---> df2.B ==> results as df3
before apply join shows

for inner join: df3.shape => (23,8)
for left join :df3.shape => (120,8)
for outer join: df3.shape => (60,8)
number of duplicates in df1.A
number of duplicates in df2.B
For left join preduced # null in df3

Joining options:

inner
left
right
outer
append
** based on index
** based on left and right columns
** based on fuzzy match for example match: foo < --> Foo or boo or FOO or foo.

reza1615 · 2020-09-04T13:39:13Z

Some icons

from

https://www.reddit.com/r/SQL/comments/aysflk/sql_join_chart_custom_poster_size/

reza1615 · 2020-09-04T13:41:22Z

for fuzzy join
https://stackoverflow.com/a/56315491/5833945

reza1615 · 2020-09-04T14:39:08Z

To Show df3.shape before join

Inner Join

rows = sum(df1.A==df2.B)
columns = df1.shape[1]+df2.shape[1]-1

left Join

# If there is some duplications in the df2. we will have more rows in df3 than df1
dups_values_df2_B = df2.pivot_table(index=['B'], aggfunc='size')
Unique_B_in_A= sum(x in list(dict(dups_values_df2_B)) for x in df1.A)
Total_duplicate_B=sum(list(dups_values_df2_B ))
Add_to_rows=Total_duplicate_B-Unique_B_in_A
rows = df1.shape[0]+Add_to_rows

Outer Join

rows = df1.shape[0]+df2.shape[0]
columns = df1.shape[1]+df2.shape[1]-1

reza1615 · 2020-09-04T14:40:36Z

For Multiple column Join we can make a new column with concat of all the columns and just join based on the new column

aschonfeld · 2021-02-15T05:03:31Z

added in v1.35.0 See demo here

* man-group#261: Merging/Stacking UI * bumped version numbers 1.35.0 * Fix missing single quote in dependencies string # Fixes: ``` Resolving dependencies... (0.2s)<debug>PackageInfo:</debug> Invalid constraint (scikit-learn (>='0.21.0)) found in dtale-1.34.0 dependencies, skipping ``` * change global_stage into interfaces * man-group#430: replace empty strings with nans when converting dates to timestamp floats * man-group#431: fixed stacking code example * fixed formatting and some updates * man-group#432: updated calls to "get_instance" in merge code snippets * man-group#433: fixed exception message display in merge UI * fix format and lint * Update dtale/global_state.py Co-authored-by: Andrew Schonfeld <andrew.schonfeld1@gmail.com> * format * man-group#434: Additional PPS formatting * add view data by name * fix tests * fix dash path and revert json_timestamp * fix test_get_pps_matrix * fixed recursion error in redis global state * fixes for python2.7 test failures Co-authored-by: Andrew Schonfeld <andrew.schonfeld1@gmail.com> Co-authored-by: Riley Shea <rileymshea@gmail.com> Co-authored-by: Anthr@X <anthrax1@users.noreply.github.com>

aschonfeld added a commit that referenced this issue Feb 14, 2021

#261: Merging/Stacking UI

225fcda

aschonfeld closed this as completed Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join instances #261

Join instances #261

reza1615 commented Sep 4, 2020 •

edited

Loading

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

aschonfeld commented Feb 15, 2021

Join instances #261

Join instances #261

Comments

reza1615 commented Sep 4, 2020 • edited Loading

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

reza1615 commented Sep 4, 2020

aschonfeld commented Feb 15, 2021

reza1615 commented Sep 4, 2020 •

edited

Loading