Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type: KType in DataColumnImpl mismatches actual values sometimes #713

Closed
Jolanrensen opened this issue Jun 3, 2024 · 10 comments
Closed
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Jun 3, 2024

type: KType in DataColumnImpl mismatches actual values in some cases. This can result in runtime exceptions and makes life difficult attempting to fix #30 or #704 where we assume the type always correctly represents the data.
This issue relates to #701 as well.

To discover these bugs, we can introduce a (debug-only!!) check in DataColumnImpl, like:

private infix fun <T> T?.matches(type: KType) =
    when {
        this == null -> type.isMarkedNullable
        this.isPrimitiveArray -> type.isPrimitiveArray &&
            this!!::class.qualifiedName == type.classifier?.let { (it as KClass<*>).qualifiedName }

        this.isArray -> type.isArray // cannot check the precise type of array
        else -> this!!::class.isSubclassOf(type.classifier as KClass<*>)
    }

init {
    if (DEBUG) {
        require(values.all { it matches type }) {
            val types = values.map { if (it == null) "Nothing?" else it!!::class.simpleName }.distinct()
            "Values of column '$name' have types '$types' which are not compatible given with column type '$type'"
        }
    }
}

At the moment of testing, I can find 8+ breaking tests in :core:

  • org.jetbrains.kotlinx.dataframe.io.PlaylistJsonTest#aggregate by column
    • data has types [DataFrameImpl], kType is: org.jetbrains.kotlinx.dataframe.aggregation.AggregateGroupedDsl<org.jetbrains.kotlinx.dataframe.io.PlaylistJsonTest.DataFrameType1>
    • This exception is probably fine to ignore, as AggregateGroupedDsl is a DataFrame
  • org.jetbrains.kotlinx.dataframe.samples.api.Analyze#pivotDefault_accessors / #pivotDefault_strings / #pivotDefault_properties
    • data has types [Boolean, Int], kType is: kotlin.Boolean
    • Not really sure what's going on here, something with concatImpl I think
  • org.jetbrains.kotlinx.dataframe.samples.api.Modify#customConverters
    • data has types [Nothing?], kType is kotlin.Int
    • Seems to originate from convertToImpl$convertToSchema
  • org.jetbrains.kotlinx.dataframe.samples.api.Modify#implode
    • data has types [DataFrameImpl, Nothing?], kType is: org.jetbrains.kotlinx.dataframe.DataFrame<*>, not nullable
    • Seems to originate from implodeImpl where a null is put in a FrameColumn
  • org.jetbrains.kotlinx.dataframe.testSets.person.DataFrameTests#convertTo
    • data has type: [Nothing?], kType is: kotlin.Int
    • From convertToImpl$convertToSchema too
  • org.jetbrains.kotlinx.dataframe.testSets.person.DataFrameTreeTests#merge rows into table
    • data has types [DataFrameImpl, Nothing?], kType is: org.jetbrains.kotlinx.dataframe.DataFrame<*>, not nullable
    • Seems to originate from implodeImpl too where a null is put in a FrameColumn

Edit: running it afresh (clean pull of master with check) I get 15 failing tests.

There is also an exception in :dataframe-jdbc: #701

@Jolanrensen
Copy link
Collaborator Author

7/15 tests are fixed by #727

Next are the pivot tests with 0 and true in Boolean columns.
I suspect this is due to default() taking Any? and the pivot implementation not re-inferring the types of the columns after filling in a null with the given default in: df.pivot { city }.groupBy { name }.default(0).min().

@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Jun 12, 2024

After merging #734 there are just 5 failing tests left:
image

@Jolanrensen Jolanrensen modified the milestones: Backlog, 0.14.0 Jul 29, 2024
@Jolanrensen
Copy link
Collaborator Author

Maybe we could use https://github.com/gmazzo/gradle-buildconfig-plugin to set a DEBUG flag and check these types when running tests

Jolanrensen added a commit that referenced this issue Jul 30, 2024
…enabled. Updated contribution guide too. TC will have to be updated manually
@Jolanrensen
Copy link
Collaborator Author

#796 adds the ability to set -Pkotlin.dataframe.debug=true and run the checks for this issue :)

@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Jul 30, 2024

@zaleslaw After #796 is merged, I´d recommend running the JDBC tests again in debug mode.

I now have 15 failing tests just from that module. They contain all sorts of Char/String, Double/Float, and Int/Short mismatches. These will most likely result in runtime exceptions on the user's side if left untreated.

@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Jul 30, 2024

One of the tests triggeres in convertToImpl. It seems like a temporary null-filled column is created even though the column type might be not-nullable. This needs a small workaround but probably would not have caused user-facing bugs.

Edit: fixed by #800

Jolanrensen added a commit that referenced this issue Jul 30, 2024
…-nullable column with nulls in it. This is now avoided and the behavior has been simplified/stabilized
Jolanrensen added a commit that referenced this issue Jul 30, 2024
…enabled. Updated contribution guide too. TC will have to be updated manually
Jolanrensen added a commit that referenced this issue Jul 30, 2024
…ary non-nullable column with nulls in it. This is now avoided and the behavior has been simplified/stabilized
Jolanrensen added a commit that referenced this issue Jul 30, 2024
…rary column was created with nulls in a FrameColumn
Jolanrensen added a commit that referenced this issue Jul 31, 2024
Jolanrensen added a commit that referenced this issue Jul 31, 2024
@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Aug 1, 2024

The last two main tests are fixed by #801 and #803.

To solve this issue, we just need to:

@zaleslaw
Copy link
Collaborator

zaleslaw commented Aug 1, 2024

Please, @Jolanrensen, create a ticket with the list of these JDBC tests, include in 0.14 milestone and assign on me

@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Aug 6, 2024

Attempt to work with assert() and the -ea JVM flag as it does the same :)

Conclusion: this is not an option. It also runs when DataFrame is used in a user-project test.

@Jolanrensen Jolanrensen changed the title type: KType in DataColumnImpl mismatches actual values sometimes type: KType in DataColumnImpl mismatches actual values sometimes Aug 20, 2024
@Jolanrensen
Copy link
Collaborator Author

Debug mode is now enabled on TeamCity! Thanks for the help fixing the issues @zaleslaw :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants