Question About Mapping

Hello, question for the audience. How should we handle the calculation of fields in a dataset map function? Do we need a manual A -> B class mapping or can we use something generic like the map Row -> Row and then to<ExpectedClass> to handle the field mappings? I can see it becoming tedious to pass in every since property from A -> B.  Some of the classes I need to process have 20+ properties. 

For example, I would like to accept fields as String?, but in later data frames I want to convert them to Int?.  Maybe it would be something like what is below. I know that this example is small but keep in mind I don't want to pass in all 20+ properties to another class constructor. Maybe in the example below we should use sealed classes for possible errors? Not to opinionated about how this should be handled. I know the following code cannot work since RDD's are immutable but it would nice to have some kind of convenience like below to work around that. 


```
data class Client(val age: String?)

data class ClientCalculated(val age: Int?, val errorMessage: String?)

fun litNullAsString() = functions.lit(null).cast(DataTypes.StringType)

 listOf(Client("30"), Client("thirty"))
                .toDS()
                .withColumn("errorMessage", litNullAsString())
                .map {
                    val age = it.getAs<String?>("age")?.toIntOrNull()
                    it["age"] = age
                    it["errorMessage"] = if (age == null) {
                        "age is invalid"
                    } else {
                        null
                    }
                    it
                }.to<ClientCalculated>()
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question About Mapping #185

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Question About Mapping #185

Description

Activity

Jolanrensen commented on Oct 26, 2022

asm0dey commented on May 8, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions