Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

outerJoin not setting the "by" field #94

Closed
uwemaurer opened this issue Oct 7, 2020 · 3 comments
Closed

outerJoin not setting the "by" field #94

uwemaurer opened this issue Oct 7, 2020 · 3 comments

Comments

@uwemaurer
Copy link

When I try outerJoin with this program:


    val user = dataFrameOf(
            "first_name", "last_name", "age", "weight")(
            "Max", "Doe", 23, 55,
            "Franz", "Smith", 23, 88,
            "Horst", "Keanes", 12, 82
    )

    val pets = dataFrameOf("first_name", "pet")(
            "Max", "Cat",
            "Franz", "Dog",
            // no pet for Horst
            "Uwe", "Elephant", // Uwe is not in user dataframe
    )

    pets.outerJoin(user).print("outer1")
    user.outerJoin(pets).print("outer2")

I get this:

outer1: 4 x 5
    first_name        pet   last_name    age   weight
1          Max        Cat         Doe     23       55
2          Uwe   Elephant        <NA>   <NA>     <NA>
3        Franz        Dog       Smith     23       88
4         <NA>       <NA>      Keanes     12       82

outer2: 4 x 5
    first_name   last_name    age   weight        pet
1          Max         Doe     23       55        Cat
2         <NA>        <NA>   <NA>     <NA>   Elephant
3        Franz       Smith     23       88        Dog
4        Horst      Keanes     12       82       <NA>

I think it should set Horst in row 4 (outer1) , and Uwe in row 2 (outer2)

Also there is a System.err.println in Joins:defaultBy which could be removed

@holgerbrandl
Copy link
Owner

Well spotted. Thank you. What a great brain teaser for such a rainy evening. :-)

Corrected in v0.14 which I've released today including the fix.

@uwemaurer
Copy link
Author

Thank you for fixing it so quickly!

@holgerbrandl
Copy link
Owner

Concerning the System.err.println: It is actually intentional to report automatically determined by columns to the user. If the user makes this explicit by providing by there is no such logging. We have adopted this pattern from dplyr

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants