-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] NULL value filtering not working correctly #23282
Comments
@kushagraThapar could you please help route this? |
@moderakh - can please take a look at this issue ? |
@em-daniil-terentyev could you please provide more info on this?
|
Hi, @moderakh, Thanks for your response. Here are answers for your questions. I hope it helps.
In other words make it work in a correct way as it was in previous version of library com.microsoft.azure.cosmosdb.spark.
Looking forward to new version of library with correct treating of NULL values. Thanks in advance. Regards, |
Hi, @moderakh. Are there any news about this issue? Regards, |
Fixes: #23282 cosmos DB is schema-less, spark is schema-full. when reading data from cosmos DB, spark connector translates both null and undefined values to null spark column value. hence from the spark perspective null and not defined values in cosmos db are the same. expected behaviour: if there is a null spark filter on a column value, that should be translated to either null value or undefined value on the cosmos db query pushdown
@moderakh, thanks a lot! :) |
Hi everybody!
When I import data from cosmos db into databricks, and create temporary view from this data, filter "someField IS NOT NULL" is not working correctly. The only way to make it work is to add one more condition "LOWER(CAST(someField AS STRING)) <> 'null'", but it's not correct, because for this field there is not data with string value, it contains json object or NULL.
`connectionConfig = {
"spark.cosmos.accountEndpoint" : "endpoint",
"spark.cosmos.accountKey" : "key",
"spark.cosmos.database" : "database",
"spark.cosmos.container" : "containter",
"spark.cosmos.read.inferSchema.enabled" : "true"
}
spark
.read
.format("cosmos.oltp")
.options(**connectionConfig)
.load()
.createOrReplaceTempView("tmp_cosmos_data")`
Apache Spark 3.1.1
Library: com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.2.0
Operating System: Ubuntu 18.04.5 LTS
Java: Zulu 8.52.0.23-CA-linux64 (build 1.8.0_282-b08)
If you can't reproduce it on your side with simple execution of mentioned actions please let me know. To fill all the requirements mentioned in this bug report and to meet privacy requirements it's needed to create separate instance of Cosmos DB and so on. Please let me know if it's necessary.
Thanks in advance.
Regards,
Daniil.
The text was updated successfully, but these errors were encountered: