Skip to content

NPE in FrameColumnImpl.schema property #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
koperagen opened this issue Feb 15, 2024 · 3 comments · Fixed by #925
Closed

NPE in FrameColumnImpl.schema property #593

koperagen opened this issue Feb 15, 2024 · 3 comments · Fixed by #925
Assignees
Labels
bug Something isn't working csv CSV / delim related issues
Milestone

Comments

@koperagen
Copy link
Collaborator

NPE happens in this line.

values.mapNotNull { it.takeIf { it.nrow > 0 }?.schema() }.intersectSchemas()

Code that leads to it:
image

I briefly looked into csv.kt and found that only tryParseImpl method could potentially create FrameColumn and provide null there. Need to confirm that it's possible.
Another thing that could cause problem is read method itself that actually tries to parse the file as JSON, CSV, TSV, Excel and others until it succeeds. So, if that file cannot be parsed as CSV, it continues and can produce strange result too

Full stack trace below

The problem is found in one of the loaded libraries: check library converters (fields callbacks)
java.lang.NullPointerException: Parameter specified as non-null is null: method org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow, parameter <this>
org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library converters (fields callbacks)
	at org.jetbrains.kotlinx.jupyter.exceptions.CompositeReplExceptionKt.throwLibraryException(CompositeReplException.kt:50)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:68)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:94)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:93)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll(logging.kt:42)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll$default(logging.kt:41)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:93)
	at org.jetbrains.kotlinx.jupyter.repl.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:14)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:500)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:478)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.withEvalContext(repl.kt:441)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.evalEx(repl.kt:478)
	at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:320)
	at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:314)
	at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:38)
	at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:33)
	at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
Caused by: java.lang.NullPointerException: Parameter specified as non-null is null: method org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow, parameter <this>
	at org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow(DataFrame.kt)
	at org.jetbrains.kotlinx.dataframe.impl.columns.FrameColumnImpl$schema$1.invoke(FrameColumnImpl.kt:43)
	at org.jetbrains.kotlinx.dataframe.impl.columns.FrameColumnImpl$schema$1.invoke(FrameColumnImpl.kt:42)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.kotlinx.dataframe.impl.schema.UtilsKt.extractSchema(Utils.kt:92)
	at org.jetbrains.kotlinx.dataframe.impl.schema.UtilsKt.extractSchema(Utils.kt:26)
	at org.jetbrains.kotlinx.dataframe.api.SchemaKt.schema(schema.kt:17)
	at org.jetbrains.kotlinx.dataframe.impl.codeGen.ReplCodeGeneratorImpl.process(ReplCodeGeneratorImpl.kt:50)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.updateAnyFrameVariable(Integration.kt:132)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.access$updateAnyFrameVariable(Integration.kt:73)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:295)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:290)
	at org.jetbrains.kotlinx.jupyter.api.libraries.FieldHandlerFactory.createUpdateExecution$lambda$0(FieldHandlerFactory.kt:38)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.executeEx(FieldsProcessorImpl.kt:88)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.access$executeEx(FieldsProcessorImpl.kt:1)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:47)
	... 15 more

@Jolanrensen Jolanrensen added the bug Something isn't working label Feb 15, 2024
@koperagen
Copy link
Collaborator Author

So indeed some JSON value in the cell + null value in other causes an issue in CSV reading

@zaleslaw zaleslaw added this to the Backlog milestone Feb 20, 2024
@koperagen
Copy link
Collaborator Author

val df2 = DataFrame.readDelimStr("""name
"[""str""]"
null
""")

@koperagen koperagen self-assigned this Feb 27, 2024
@Jolanrensen Jolanrensen added the csv CSV / delim related issues label Aug 20, 2024
@Jolanrensen Jolanrensen mentioned this issue Aug 20, 2024
28 tasks
@Jolanrensen
Copy link
Collaborator

What is the expected result? A FrameColumn cannot contain nulls, right?

Should we:

  • throw exception, because FrameColumn cannot contain null
  • convert null to empty dataframe
  • Don't parse the value as JSON, but keep it a String

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working csv CSV / delim related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants