Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join instances #261

Closed
reza1615 opened this issue Sep 4, 2020 · 5 comments
Closed

Join instances #261

reza1615 opened this issue Sep 4, 2020 · 5 comments

Comments

@reza1615
Copy link

reza1615 commented Sep 4, 2020

It would be helpful to have Join two or multiple instances. My suggestion:

  • The join window can be similar to description window
  • when user select a column to join shows max, min, number of empty or nan, number of unique and duplicate values, datatype of that column
  • before applying join based on joining type, show number of expected rows in the merged df

For example there is df1 and df2
df1.shape == (100,3)
df2.shape== (30,5)
df1.A <---- join to ---> df2.B ==> results as df3
before apply join shows

  • for inner join: df3.shape => (23,8)
  • for left join :df3.shape => (120,8)
  • for outer join: df3.shape => (60,8)
  • number of duplicates in df1.A
  • number of duplicates in df2.B
  • For left join preduced # null in df3

Joining options:

  • inner
  • left
  • right
  • outer
  • append
    ** based on index
    ** based on left and right columns
    ** based on fuzzy match for example match: foo < --> Foo or boo or FOO or foo.
@reza1615
Copy link
Author

reza1615 commented Sep 4, 2020

@reza1615
Copy link
Author

reza1615 commented Sep 4, 2020

for fuzzy join
https://stackoverflow.com/a/56315491/5833945

@reza1615
Copy link
Author

reza1615 commented Sep 4, 2020

To Show df3.shape before join

  • Inner Join
rows = sum(df1.A==df2.B)
columns = df1.shape[1]+df2.shape[1]-1
  • left Join
# If there is some duplications in the df2. we will have more rows in df3 than df1
dups_values_df2_B = df2.pivot_table(index=['B'], aggfunc='size')
Unique_B_in_A= sum(x in list(dict(dups_values_df2_B)) for x in df1.A)
Total_duplicate_B=sum(list(dups_values_df2_B ))
Add_to_rows=Total_duplicate_B-Unique_B_in_A
rows = df1.shape[0]+Add_to_rows
  • Outer Join
rows = df1.shape[0]+df2.shape[0]
columns = df1.shape[1]+df2.shape[1]-1

@reza1615
Copy link
Author

reza1615 commented Sep 4, 2020

For Multiple column Join we can make a new column with concat of all the columns and just join based on the new column

aschonfeld added a commit that referenced this issue Feb 14, 2021
@aschonfeld
Copy link
Collaborator

added in v1.35.0 See demo here

AnthraX1 added a commit to AnthraX1/dtale that referenced this issue Feb 27, 2021
* man-group#261: Merging/Stacking UI

* bumped version numbers 1.35.0

* Fix missing single quote in dependencies string

# Fixes:

```
Resolving dependencies... (0.2s)<debug>PackageInfo:</debug> Invalid constraint (scikit-learn (>='0.21.0)) found in dtale-1.34.0 dependencies, skipping
```

* change global_stage into interfaces

* man-group#430: replace empty strings with nans when converting dates to timestamp floats

* man-group#431: fixed stacking code example

* fixed formatting and some updates

* man-group#432: updated calls to "get_instance" in merge code snippets

* man-group#433: fixed exception message display in merge UI

* fix format and lint

* Update dtale/global_state.py

Co-authored-by: Andrew Schonfeld <andrew.schonfeld1@gmail.com>

* format

* man-group#434: Additional PPS formatting

* add view data by name

* fix tests

* fix dash path and revert json_timestamp

* fix test_get_pps_matrix

* fixed recursion error in redis global state

* fixes for python2.7 test failures

Co-authored-by: Andrew Schonfeld <andrew.schonfeld1@gmail.com>
Co-authored-by: Riley Shea <rileymshea@gmail.com>
Co-authored-by: Anthr@X <anthrax1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants