Skip to content

Added 3 APIs for Flatten function #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,22 @@ package org.jetbrains.kotlinx.dataframe.api

import org.jetbrains.kotlinx.dataframe.ColumnsSelector
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.columns.ColumnReference
import org.jetbrains.kotlinx.dataframe.impl.api.flattenImpl
import org.jetbrains.kotlinx.dataframe.impl.columns.toColumns
import kotlin.reflect.KProperty

// region DataFrame

public fun <T> DataFrame<T>.flatten(): DataFrame<T> = flatten { all() }

public fun <T, C> DataFrame<T>.flatten(
columns: ColumnsSelector<T, C>
): DataFrame<T> = flattenImpl(columns)
public fun <T, C> DataFrame<T>.flatten(columns: ColumnsSelector<T, C>): DataFrame<T> = flattenImpl(columns)

public fun <T> DataFrame<T>.flatten(vararg columns: String): DataFrame<T> = flattenImpl { columns.toColumns() }

public fun <T, C> DataFrame<T>.flatten(vararg columns: KProperty<C>): DataFrame<T> = flattenImpl { columns.toColumns() }

public fun <T, C> DataFrame<T>.flatten(vararg columns: ColumnReference<C>): DataFrame<T> =
flattenImpl { columns.toColumns() }

// endregion
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package org.jetbrains.kotlinx.dataframe.api

import io.kotest.matchers.shouldBe
import org.jetbrains.kotlinx.dataframe.DataRow
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.junit.Test

class FlattenTests {
Expand All @@ -13,6 +15,41 @@ class FlattenTests {
grouped.add("a") { 0 }.flatten().columnNames() shouldBe listOf("a1", "b", "c", "a")
}

@DataSchema
interface TestRow {
val a: String
val b: String
val c: String
}

@DataSchema
interface Grouped {
val d: DataRow<TestRow>
}

@Test
fun `flatten access APIs`() {
val df = dataFrameOf("a", "b", "c")(1, 2, 3)
val grouped = df.group("a", "b").into("d")

// String API
grouped.flatten("d") shouldBe df
val castedGroupedDF = grouped.cast<Grouped>()

// KProperties API
castedGroupedDF.flatten(Grouped::d) shouldBe df

// Extension properties API
castedGroupedDF.flatten { d } shouldBe df

// Column accessors API
val d by columnGroup()
val a by d.column<String>()
val b by d.column<String>()
val c by d.column<String>()
grouped.flatten(d) shouldBe df
}

@Test
fun `flatten nested`() {
val df = dataFrameOf("a", "b", "c", "d")(1, 2, 3, 4)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1052,14 +1052,44 @@ class Modify : TestBase() {
}

@Test
fun flatten() {
fun flatten_properties() {
// SampleStart
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten { name }
// SampleEnd
}

@Test
fun flatten_strings() {
// SampleStart
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten("name")
// SampleEnd
}

@Test
fun flatten_accessors() {
// SampleStart
val name by columnGroup()
val firstName by name.column<String>()
val lastName by name.column<String>()
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten(name)
// SampleEnd
}

@Test
fun flatten_KProperties() {
// SampleStart
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten(df::name)
// SampleEnd
}

@Test
fun flattenAll() {
// SampleStart
Expand Down
23 changes: 23 additions & 0 deletions docs/StardustDocs/topics/flatten.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,36 @@ flatten [ { columns } ]
Columns after flattening will keep their original names. Potential column name clashes are resolved by adding minimal possible name prefix from ancestor columns.

<!---FUN flatten-->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you renamed this function to flatten_properties in Modify.kt

We use Korro to check that samples in stardust are actually compilable. So make sure that all examples in markdown are surrounded like
<!---FUN someFunctionName-->
your example, which is between // SampleStart and // SampleEnd comments in samples/api/Modify.kt inside a function with the same name.
<!---END-->

Copy link
Collaborator Author

@zaleslaw zaleslaw Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look, for example insert

It has in documentation the following


<!---FUN insert-->
<tabs>
<tab title="Properties">

and in modify the following


  @Test
    fun insert_properties() {
        // SampleStart
        df.insert("year of birth") { 2021 - age }.after { age }
        // SampleEnd
    }

    @Test
    fun insert_accessors() {
        // SampleStart
        val year = column<Int>("year of birth")
        val age by column<Int>()

        df.insert(year) { 2021 - age }.after { age }
        // SampleEnd
    }

    @Test
    fun insert_strings() {
        // SampleStart
        df.insert("year of birth") { 2021 - "age"<Int>() }.after("age")
        // SampleEnd
    }

Looks like I do the same, but is the best way to check that I am right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template looks like: name of function underscore acess api

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh now I see! Didn't know this was possible. Cool :)

<tabs>
<tab title="Properties">

```kotlin
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten { name }
```

</tab>
<tab title="Accessors">

```kotlin
val name by columnGroup()
val firstName by name.column<String>()
val lastName by name.column<String>()

// name.firstName -> firstName
// name.lastName -> lastName
df.flatten(name)
```

</tab>
<tab title="Strings">

```kotlin
df.flatten("name")
```

</tab></tabs>
<!---END-->

To remove all column groupings in [`DataFrame`](DataFrame.md), invoke `flatten` without parameters:
Expand Down